[torqueusers] performance problem on x86_64
Garrick Staples
garrick at usc.edu
Thu Oct 6 11:43:43 MDT 2005
I'm getting plagued by a strange performance problem in x86_64 TORQUE. It's
driving me nuts.
Multiple, quick stats of jobs or nodes are very very slow when run on any x86_64
host. The examples below work fine if I run it from any 32bit hosts. And it
seems to only happen when a lot of single-node jobs are in the queue (running or
idle). (Dave, I think you've seen this happen on TeraGrid)
These work fine:
qstat -a
qstat job1
qstat job2
qstat; qstat; qstat;...
pbsnodes -a
pbsnodes -a hostname
pbsnodes -a; pbsnodes -a; pbsnodes -a;...
These always print the first job or node quickly, but then may or may
not have a 2 second hang before printing any others:
qstat job1 job2 ...
qstat job1; qstat job1; qstat job1;...
pbsnodes -a node1; pbsnodes -a node2;...
If I strace these, I see that 'qstat' is hanging on the pbs_iff process.
But the really weird part is that 'strace -f qstat ...' always completes
successfully. Something about stracing the pbs_iff process fixes the
problem.
Anyone have any ideas?
And again, this only happens on 64bit hosts. None of this weirdness
happens when I run qstat/pbsnodes from a 32bit host.
Some output of 'strace qstat job1 job2 job3':
connect(3, {sa_family=AF_INET, sin_port=htons(15001),
sin_addr=inet_addr("192.168.0.205")}, 16) = 0
pipe([4, 5]) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2a95acc130) = 26895
close(5) = 0
fcntl(4, F_GETFL) = 0 (flags O_RDONLY)
read(4, <-- 2 second hang here while calling pbs_jobstat() on job2
"\0\0\0\0", 4) = 4
close(4) = 0
wait4(26895, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 26895
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051006/0e773b87/attachment.bin
More information about the torqueusers
mailing list