[torqueusers] torque getting stuck
Adrian Sevcenco
Adrian.Sevcenco at cern.ch
Fri May 8 14:26:06 MDT 2009
Hi! I have a situation with torque-server-2.3.0 .. after 5 to 10 minutes
after a restart the servers is stuck .. for a qstat command i have
[root at grid01 ~]# time -p qstat
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
.....
real 23.60
user 0.00
sys 0.00
and for maui to contact the pbs_server i have
[root at grid01 ~]# time -p diagnose -n
ERROR: lost connection to server
ERROR: cannot request service (status)
real 29.99
user 0.00
sys 0.00
i done an strace for a few hours on pbs_server and it showed me this :
[root at grid01 ~]# cat pbs_server.trace
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
35.87 2.424344 73 33334 8 select
15.92 1.075911 6 177583 time
11.57 0.782018 29 27419 26 write
9.49 0.641369 25 25969 2 poll
3.83 0.258976 4 69645 61244 recvfrom
3.45 0.233453 16 14926 9574 connect
2.84 0.192112 8 23314 close
2.75 0.185961 4 41387 9504 read
2.11 0.142857 3 56436 fcntl64
2.11 0.142558 15 9527 brk
1.89 0.127642 9 14926 socket
1.40 0.094394 17 5417 send
1.20 0.081420 9 9485 shutdown
0.91 0.061722 6 9514 getsockopt
0.70 0.047012 5 9585 69 bind
0.61 0.041050 3 11990 setsockopt
0.56 0.037660 9 4184 sendto
0.49 0.033069 6 5885 open
0.46 0.031291 5 5700 munmap
0.43 0.029329 5 5394 gettimeofday
0.31 0.021006 8 2506 accept
0.29 0.019565 3 5720 mmap2
0.23 0.015729 4 4242 25 ioctl
0.22 0.014798 6 2474 recvmsg
0.20 0.013414 2 5700 fstat64
0.08 0.005256 584 9 clone
0.05 0.003105 22 140 22 unlink
0.01 0.000587 12 48 link
0.01 0.000341 5 75 stat64
0.00 0.000073 4 18 8 waitpid
0.00 0.000044 2 18 rt_sigprocmask
0.00 0.000020 2 10 9 sigreturn
------ ----------- ----------- --------- --------- ----------------
100.00 6.758086 582580 80491 total
Can some torque expert see some problems here?
Thank you,
Adrian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3105 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20090508/23a4f0c3/attachment.bin
More information about the torqueusers
mailing list