[torqueusers] Understanding & dealing with torque error codes
Jeffrey J. Barteet
barteet at mrl.ucsb.edu
Thu Oct 28 10:52:45 MDT 2004
Hey, David,
I'm using a distribution of torque from last winter when it was briefly called
spbs.
In my source distribution, the file:
/spbs-1.0.1/src/include/pbs_error.h
contains the defintions to errors you receive in log files, and I've found it
valuable in debugging issues like this.
This portion of the file defines the errors you were seeing:
#define PBSE_NONE 0 /* no error */
#define PBSE_UNKJOBID 15001 /* Unknown Job Identifier */
#define PBSE_NOATTR 15002 /* Undefined Attribute */
#define PBSE_ATTRRO 15003 /* attempt to set READ ONLY attribute */
#define PBSE_IVALREQ 15004 /* Invalid request */
#define PBSE_UNKREQ 15005 /* Unknown batch request */
#define PBSE_TOOMANY 15006 /* Too many submit retries */
#define PBSE_PERM 15007 /* No permission */
#define PBSE_BADHOST 15008 /* access from host not allowed */
#define PBSE_JOBEXIST 15009 /* job already exists */
#define PBSE_SYSTEM 15010 /* system error occurred */
#define PBSE_INTERNAL 15011 /* internal server error occurred */
#define PBSE_REGROUTE 15012 /* parent job of dependent in rte que */
#define PBSE_UNKSIG 15013 /* unknown signal name */
#define PBSE_BADATVAL 15014 /* bad attribute value */
#define PBSE_MODATRRUN 15015 /* Cannot modify attrib in run state */
#define PBSE_BADSTATE 15016 /* request invalid for job state */
#define PBSE_UNKQUE 15018 /* Unknown queue name */
#define PBSE_BADCRED 15019 /* Invalid Credential in request */
#define PBSE_EXPIRED 15020 /* Expired Credential in request */
#define PBSE_QUNOENB 15021 /* Queue not enabled */
#define PBSE_QACESS 15022 /* No access permission for queue */
#define PBSE_BADUSER 15023 /* Bad user - no password entry */
#define PBSE_HOPCOUNT 15024 /* Max hop count exceeded */
#define PBSE_QUEEXIST 15025 /* Queue already exists */
#define PBSE_ATTRTYPE 15026 /* incompatable queue attribute type */
#define PBSE_QUEBUSY 15027 /* Queue Busy (not empty) */
#define PBSE_QUENBIG 15028 /* Queue name too long */
#define PBSE_NOSUP 15029 /* Feature/function not supported */
#define PBSE_QUENOEN 15030 /* Cannot enable queue,needs add def */
#define PBSE_PROTOCOL 15031 /* Protocol (ASN.1) error */
#define PBSE_BADATLST 15032 /* Bad attribute list structure */
#define PBSE_NOCONNECTS 15033 /* No free connections */
#define PBSE_NOSERVER 15034 /* No server to connect to */
#define PBSE_UNKRESC 15035 /* Unknown resource */
#define PBSE_EXCQRESC 15036 /* Job exceeds Queue resource limits */
#define PBSE_QUENODFLT 15037 /* No Default Queue Defined */
#define PBSE_NORERUN 15038 /* Job Not Rerunnable */
#define PBSE_ROUTEREJ 15039 /* Route rejected by all destinations */
#define PBSE_ROUTEEXPD 15040 /* Time in Route Queue Expired */
#define PBSE_MOMREJECT 15041 /* Request to MOM failed */
Good luck,
-jeffrey
Quoting David Baker <D.J.Baker at soton.ac.uk>:
> Hi,
> We are currently setting up a medium sized (160 nodes) cluster based on
> torque (1.1.0p0), and maui (3.2.6p7). We are finding that the node moms
> report various error codes, and that we can not find any documentation or
> helps on dealing with these conditions. The most problematic error is
> 15004 -- the mom appears to be in a state of confusion, and rejects jobs
> until the mom is restarted. Does anyone out there have an automated
> procedure for preventing and/or dealing with this issue, please?
>
> Other error conditions we have seen are 15001, 15009 and 15029. In general
> terms does supercluster or any users/group have access to any documentation
> that might enable us to understand and control these conditions, please?
>
> Your advice and comments would be appreciated, please.
> Thank you -- David Baker.
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers
>
Jeffrey J. Barteet
Systems Administrator
Materials Research Laboratory
University of California, Santa Barbara
More information about the torqueusers
mailing list