[torqueusers] performance problem on x86_64

Garrick Staples garrick at usc.edu
Thu Oct 6 11:43:43 MDT 2005


I'm getting plagued by a strange performance problem in x86_64 TORQUE.  It's
driving me nuts.

Multiple, quick stats of jobs or nodes are very very slow when run on any x86_64
host.  The examples below work fine if I run it from any 32bit hosts.  And it
seems to only happen when a lot of single-node jobs are in the queue (running or
idle).  (Dave, I think you've seen this happen on TeraGrid)

These work fine:
   qstat -a
   qstat job1
   qstat job2
   qstat; qstat; qstat;...
   pbsnodes -a
   pbsnodes -a hostname
   pbsnodes -a; pbsnodes -a; pbsnodes -a;...

These always print the first job or node quickly, but then may or may
not have a 2 second hang before printing any others:

   qstat job1 job2 ...
   qstat job1; qstat job1; qstat job1;...
   pbsnodes -a node1; pbsnodes -a node2;...

If I strace these, I see that 'qstat' is hanging on the pbs_iff process.
But the really weird part is that 'strace -f qstat ...' always completes
successfully.  Something about stracing the pbs_iff process fixes the
problem.

Anyone have any ideas?

And again, this only happens on 64bit hosts.  None of this weirdness
happens when I run qstat/pbsnodes from a 32bit host.

Some output of 'strace qstat job1 job2 job3':

connect(3, {sa_family=AF_INET, sin_port=htons(15001),
sin_addr=inet_addr("192.168.0.205")}, 16) = 0
pipe([4, 5])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2a95acc130) = 26895
close(5)                                = 0
fcntl(4, F_GETFL)                       = 0 (flags O_RDONLY)
read(4,    <-- 2 second hang here while calling pbs_jobstat() on job2



"\0\0\0\0", 4)                  = 4
close(4)                                = 0
wait4(26895, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 26895

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051006/0e773b87/attachment.bin


More information about the torqueusers mailing list