[torqueusers] Slow response of torque when jobs are running

Josh Bernstein jbernstein at penguincomputing.com
Mon Dec 7 21:56:05 MST 2009


I've gotta believe this is a name resolution issue.

Can you check to make sure the hostnames in TORQUEs server_name file  
contain a hostname the resolves quickly with getent?

-Josh

On Dec 7, 2009, at 7:15 PM, "Garrick Staples" <garrick at usc.edu> wrote:

> On Tue, Dec 08, 2009 at 01:39:38AM +0000, Luc Vereecken alleged:
>> Hi Chris,
>>
>> I attach a strace -T output of qstat. The output looked like a normal
>> qstat output with jobnumbers and running times etc, so nothing  
>> special
>> there.
>> The strace reveals that it all goes awry when accessing the
>> /tmp/.torque-unix. Major time is lost on a poll (line 78) and a read
>> (line 90), all other times look like normal timings.
>>
>> That reminds me that there is something like a no-unix-sockets option
>> in configure, iirc.
>
> What you want is an strace of the _server_ while doing a qstat.
>
> qstat is just going to wait for a response from the server. Your  
> strace shows
> exactly that.
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list