[torqueusers] Time out (15082) in send_job

Ken Nielson knielson at adaptivecomputing.com
Mon Aug 23 23:53:20 MDT 2010


Is this TORQUE 5.4?

Ken

----- Original Message -----
From: "Joshua Bernstein" <jbernstein at penguincomputing.com>
To: "Torque Users Mailing List" <torqueusers at supercluster.org>
Sent: Monday, August 23, 2010 1:06:50 PM
Subject: [torqueusers] Time out (15082) in send_job

Hello Folks,

I'm seeing a ton of timeouts in send_job as shown by the log errors from 
pbs_server below. According to the published list of error codes, error 
15082 isn't defined:

http://www.clusterresources.com/products/torque/docs/a.derrorcodes.shtml

PBSPro suggests that this error is a "batch request generation failed". 
In fact, is this PDF I dug up, there are a host of other codes in here 
as well. Since TORQUE seems to use some of these, perhaps they should be 
added to the TORQUE docs?

https://secure.altair.com/docs/PBSproAG_53.pdf (around page 215)

Any thoughts on what could be going on here? Any ways to work around it? 
Perhaps this error code should be added to the docs?

The logs I reference above are shown here:

08/23/2010 06:26:00;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::Time out 
(15082) in send_job, child failed in previous commit request for job 
2316886.scyld.localdomain
08/23/2010 06:26:00;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::Time out 
(15082) in send_job, child failed in previous commit request for job 
2316890.scyld.localdomain

-Joshua Bernstein
Penguin Computing
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list