[torqueusers] All Jobs stucked in undelivered dir

Prakash Velayutham velayups at email.uc.edu
Thu Mar 2 07:59:06 MST 2006


group hpc wrote:
> Hi all,
>  
> Torque seems not working well recently on our server. The output and 
> error files not be able to copy back to user home directory,  but it 
> stored in /var/spool/pbs/undelivered. Can anyone shows me how to fix 
> it? Thanks.
>  
> Btw, does anyone knows what this mean - " 
> pbs_mom;Req;dis_reply_write;DIS reply failure, -1"?
> I have included the pbs_mom log as below:
>  
> 03/02/2006 09:31:53;0002;   pbs_mom;n/a;mom_main;hello sent to server 
> 192.168.11.20 <http://192.168.11.20>
> 03/02/2006 09:31:53;0100;   pbs_mom;Req;;Type QueueJob request 
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=13
> 03/02/2006 09:31:53;0002;   pbs_mom;Req;dis_reply_write;DIS reply 
> failure, -1
> 03/02/2006 09:32:09;0100;   pbs_mom;Req;;Type StatusJob request 
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=11
> 03/02/2006 09:32:10;0100;   pbs_mom;Req;;Type StatusJob request 
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=10
> 03/02/2006 09:32:10;0100;   pbs_mom;Req;;Type QueueJob request 
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=12
> 03/02/2006 09:32:10;0002;   pbs_mom;Req;dis_reply_write;DIS reply 
> failure, -1
> 03/02/2006 09:32:10;0100;   pbs_mom;Req;;Type StatusJob request 
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=11
> 03/02/2006 09:32:10;0100;   pbs_mom;Req;;Type QueueJob request 
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=10
> 03/02/2006 09:32:10;0002;   pbs_mom;Req;dis_reply_write;DIS reply 
> failure, -1
> 03/02/2006 09:32:10;0100;   pbs_mom;Req;;Type StatusJob request 
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=12
> 03/02/2006 09:32:10;0100;   pbs_mom;Req;;Type StatusJob request 
> received from PBS_Server at test <mailto:PBS_Server at test>, sock=10
>
> -- 
> Best Regards,
> Josh
What kind of remote copy are you using? usecp or scp or rcp? If scp, 
check your ssh keys and see if everything works without entering 
passwords manually. If rcp, check your .rhosts file once again. If 
usecp, my guess is your hostnames are not getting mapped correctly.

Prakash



More information about the torqueusers mailing list