[torqueusers] All Jobs stucked in undelivered dir

Garrick Staples garrick at usc.edu
Thu Mar 2 14:20:43 MST 2006


On Thu, Mar 02, 2006 at 10:45:21PM +0800, group hpc alleged:
> Hi all,
> 
> Torque seems not working well recently on our server. The output and error
> files not be able to copy back to user home directory,  but it stored in
> /var/spool/pbs/undelivered. Can anyone shows me how to fix it? Thanks.
 
The user should have gotten an email with the exact error message.


> Btw, does anyone knows what this mean - " pbs_mom;Req;dis_reply_write;DIS
> reply failure, -1"?
> I have included the pbs_mom log as below:

That doesn't look good.  MOM is trying to reply to the queue requests,
but is failing.  Do you have any port filtering on the pbs_server host
or anything like that?

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060302/bd76b1cd/attachment.bin


More information about the torqueusers mailing list