[torqueusers] Time out (15082) in send_job

Joshua Bernstein jbernstein at penguincomputing.com
Mon Aug 23 13:06:50 MDT 2010


Hello Folks,

I'm seeing a ton of timeouts in send_job as shown by the log errors from 
pbs_server below. According to the published list of error codes, error 
15082 isn't defined:

http://www.clusterresources.com/products/torque/docs/a.derrorcodes.shtml

PBSPro suggests that this error is a "batch request generation failed". 
In fact, is this PDF I dug up, there are a host of other codes in here 
as well. Since TORQUE seems to use some of these, perhaps they should be 
added to the TORQUE docs?

https://secure.altair.com/docs/PBSproAG_53.pdf (around page 215)

Any thoughts on what could be going on here? Any ways to work around it? 
Perhaps this error code should be added to the docs?

The logs I reference above are shown here:

08/23/2010 06:26:00;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::Time out 
(15082) in send_job, child failed in previous commit request for job 
2316886.scyld.localdomain
08/23/2010 06:26:00;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::Time out 
(15082) in send_job, child failed in previous commit request for job 
2316890.scyld.localdomain

-Joshua Bernstein
Penguin Computing


More information about the torqueusers mailing list