[torqueusers] Time out (15082) in send_job
Joshua Bernstein
jbernstein at penguincomputing.com
Mon Aug 23 13:06:50 MDT 2010
Hello Folks,
I'm seeing a ton of timeouts in send_job as shown by the log errors from
pbs_server below. According to the published list of error codes, error
15082 isn't defined:
http://www.clusterresources.com/products/torque/docs/a.derrorcodes.shtml
PBSPro suggests that this error is a "batch request generation failed".
In fact, is this PDF I dug up, there are a host of other codes in here
as well. Since TORQUE seems to use some of these, perhaps they should be
added to the TORQUE docs?
https://secure.altair.com/docs/PBSproAG_53.pdf (around page 215)
Any thoughts on what could be going on here? Any ways to work around it?
Perhaps this error code should be added to the docs?
The logs I reference above are shown here:
08/23/2010 06:26:00;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::Time out
(15082) in send_job, child failed in previous commit request for job
2316886.scyld.localdomain
08/23/2010 06:26:00;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::Time out
(15082) in send_job, child failed in previous commit request for job
2316890.scyld.localdomain
-Joshua Bernstein
Penguin Computing
More information about the torqueusers
mailing list