[torqueusers] Understanding & dealing with torque error codes
jacksond at supercluster.org
Thu Oct 28 10:51:43 MDT 2004
15004 failures indicate 'invalid values' being passed into some
request. They can occur with job holds, job dependencies, job
submissions from invalid clients, attempt to start jobs in a routing
The latest torque 1.1.0p4 snapshot has logging enhancements which
record the core reason for most of these failures. Also, regarding
failure code lookup, this can be found at
http://clusterresources.com/torquedocs/2.1debugging.shtml or in
If you could send us your pbs_server and/or pbs_mom logs, we can
assist you further. This would be most useful if the daemons were
started with PBSLOGLEVEL set to 3 or highter. Please include annotation
describing what activity was occuring when the failure took place.
On Thu, 2004-10-28 at 06:07, David Baker wrote:
> We are currently setting up a medium sized (160 nodes) cluster based on
> torque (1.1.0p0), and maui (3.2.6p7). We are finding that the node moms
> report various error codes, and that we can not find any documentation or
> helps on dealing with these conditions. The most problematic error is
> 15004 -- the mom appears to be in a state of confusion, and rejects jobs
> until the mom is restarted. Does anyone out there have an automated
> procedure for preventing and/or dealing with this issue, please?
> Other error conditions we have seen are 15001, 15009 and 15029. In general
> terms does supercluster or any users/group have access to any documentation
> that might enable us to understand and control these conditions, please?
> Your advice and comments would be appreciated, please.
> Thank you -- David Baker.
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers