[torqueusers] All Jobs stucked in undelivered dir
Prakash Velayutham
velayups at email.uc.edu
Thu Mar 2 07:59:06 MST 2006
group hpc wrote:
> Hi all,
>
> Torque seems not working well recently on our server. The output and
> error files not be able to copy back to user home directory, but it
> stored in /var/spool/pbs/undelivered. Can anyone shows me how to fix
> it? Thanks.
>
> Btw, does anyone knows what this mean - "
> pbs_mom;Req;dis_reply_write;DIS reply failure, -1"?
> I have included the pbs_mom log as below:
>
> 03/02/2006 09:31:53;0002; pbs_mom;n/a;mom_main;hello sent to server
> 192.168.11.20 <http://192.168.11.20>
> 03/02/2006 09:31:53;0100; pbs_mom;Req;;Type QueueJob request
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=13
> 03/02/2006 09:31:53;0002; pbs_mom;Req;dis_reply_write;DIS reply
> failure, -1
> 03/02/2006 09:32:09;0100; pbs_mom;Req;;Type StatusJob request
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=11
> 03/02/2006 09:32:10;0100; pbs_mom;Req;;Type StatusJob request
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=10
> 03/02/2006 09:32:10;0100; pbs_mom;Req;;Type QueueJob request
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=12
> 03/02/2006 09:32:10;0002; pbs_mom;Req;dis_reply_write;DIS reply
> failure, -1
> 03/02/2006 09:32:10;0100; pbs_mom;Req;;Type StatusJob request
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=11
> 03/02/2006 09:32:10;0100; pbs_mom;Req;;Type QueueJob request
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=10
> 03/02/2006 09:32:10;0002; pbs_mom;Req;dis_reply_write;DIS reply
> failure, -1
> 03/02/2006 09:32:10;0100; pbs_mom;Req;;Type StatusJob request
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=12
> 03/02/2006 09:32:10;0100; pbs_mom;Req;;Type StatusJob request
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=10
>
> --
> Best Regards,
> Josh
What kind of remote copy are you using? usecp or scp or rcp? If scp,
check your ssh keys and see if everything works without entering
passwords manually. If rcp, check your .rhosts file once again. If
usecp, my guess is your hostnames are not getting mapped correctly.
Prakash
More information about the torqueusers
mailing list