[torqueusers] All Jobs stucked in undelivered dir

group hpc hpc.group at gmail.com
Thu Mar 2 07:45:21 MST 2006


Hi all,

Torque seems not working well recently on our server. The output and error
files not be able to copy back to user home directory,  but it stored in
/var/spool/pbs/undelivered. Can anyone shows me how to fix it? Thanks.

Btw, does anyone knows what this mean - " pbs_mom;Req;dis_reply_write;DIS
reply failure, -1"?
I have included the pbs_mom log as below:

03/02/2006 09:31:53;0002;   pbs_mom;n/a;mom_main;hello sent to server
192.168.11.20
03/02/2006 09:31:53;0100;   pbs_mom;Req;;Type QueueJob request received from
PBS_Server at test, sock=13
03/02/2006 09:31:53;0002;   pbs_mom;Req;dis_reply_write;DIS reply failure,
-1
03/02/2006 09:32:09;0100;   pbs_mom;Req;;Type StatusJob request received
from PBS_Server at test, sock=11
03/02/2006 09:32:10;0100;   pbs_mom;Req;;Type StatusJob request received
from PBS_Server at test, sock=10
03/02/2006 09:32:10;0100;   pbs_mom;Req;;Type QueueJob request received from
PBS_Server at test, sock=12
03/02/2006 09:32:10;0002;   pbs_mom;Req;dis_reply_write;DIS reply failure,
-1
03/02/2006 09:32:10;0100;   pbs_mom;Req;;Type StatusJob request received
from PBS_Server at test, sock=11
03/02/2006 09:32:10;0100;   pbs_mom;Req;;Type QueueJob request received from
PBS_Server at test, sock=10
03/02/2006 09:32:10;0002;   pbs_mom;Req;dis_reply_write;DIS reply failure,
-1
03/02/2006 09:32:10;0100;   pbs_mom;Req;;Type StatusJob request received
from PBS_Server at test, sock=12
03/02/2006 09:32:10;0100;   pbs_mom;Req;;Type StatusJob request received
from PBS_Server at test, sock=10

--
Best Regards,
Josh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20060302/9a399afb/attachment.html


More information about the torqueusers mailing list