TORQUE Resource Manager

TORQUE Administrator's Manual - 10.3 Debugging

10.3 Debugging

10.3.1   Debugging Facilities

TORQUE supports a number of diagnostic and debug options including the following:
  • PBSDEBUG environment variable - If set to 'yes', this variable will prevent pbs_server, pbs_mom, and/or pbs_sched from backgrounding themselves allowing direct launch under a debugger.  Also, some client commands will provide additional diagnostic information when this value is set.
  • PBSLOGLEVEL environment variable - Can be set to any value between 0 and 7 and specifies the logging verbosity level (default = 0)
  • PBSCOREDUMP environment variable - If set, it will cause the offending resource manager daemon to create a core file if a SIGSEGV, SIGILL, SIGFPE, SIGSYS, or SIGTRAP signal is received.  The core dump will be placed in the daemon's home directory ($PBSHOME/mom_priv for pbs_mom).
  • NDEBUG #define - if set at build time, will cause additional low-level logging information to be output to stdout for pbs_server and pbs_mom daemons.
  • tracejob reporting tool - can be used to collect and report logging and accounting information for specific jobs

10.3.2   TORQUE Error Codes

Error Code Number Error Code Name Description
PBSE_NONE 15000 no error
PBSE_UNKJOBID 15001 Unknown Job Identifier
PBSE_NOATTR 15002 Undefined Attribute
PBSE_ATTRRO 15003 attempt to set READ ONLY attribute
PBSE_IVALREQ 15004 Invalid request
PBSE_UNKREQ 15005 Unknown batch request
PBSE_TOOMANY 15006 Too many submit retries
PBSE_PERM 15007 No permission
PBSE_BADHOST 15008 access from host not allowed
PBSE_JOBEXIST 15009 job already exists
PBSE_SYSTEM 15010 system error occurred
PBSE_INTERNAL 15011 internal server error occurred
PBSE_REGROUTE 15012 parent job of dependent in rte queue
PBSE_UNKSIG 15013 unknown signal name
PBSE_BADATVAL 15014 bad attribute value
PBSE_MODATRRUN 15015 Cannot modify attribute in run state
PBSE_BADSTATE 15016 request invalid for job state
PBSE_UNKQUE 15018 Unknown queue name
PBSE_BADCRED 15019 Invalid Credential in request
PBSE_EXPIRED 15020 Expired Credential in request
PBSE_QUNOENB 15021 Queue not enabled
PBSE_QACESS 15022 No access permission for queue
PBSE_BADUSER 15023 Bad user - no password entry
PBSE_HOPCOUNT 15024 Max hop count exceeded
PBSE_QUEEXIST 15025 Queue already exists
PBSE_ATTRTYPE 15026 incompatible queue attribute type
PBSE_QUEBUSY 15027 Queue Busy (not empty)
PBSE_QUENBIG 15028 Queue name too long
PBSE_NOSUP 15029 Feature/function not supported
PBSE_QUENOEN 15030 Cannot enable queue,needs add def
PBSE_PROTOCOL 15031 Protocol (ASN.1) error
PBSE_BADATLST 15032 Bad attribute list structure
PBSE_NOCONNECTS 15033 No free connections
PBSE_NOSERVER 15034 No server to connect to
PBSE_UNKRESC 15035 Unknown resource
PBSE_EXCQRESC 15036 Job exceeds Queue resource limits
PBSE_QUENODFLT 15037 No Default Queue Defined
PBSE_NORERUN 15038 Job Not Rerunnable
PBSE_ROUTEREJ 15039 Route rejected by all destinations
PBSE_ROUTEEXPD 15040 Time in Route Queue Expired
PBSE_MOMREJECT 15041 Request to MOM failed
PBSE_BADSCRIPT 15042 (qsub) cannot access script file
PBSE_STAGEIN 15043 Stage In of files failed
PBSE_RESCUNAV 15044 Resources temporarily unavailable
PBSE_BADGRP 15045 Bad Group specified
PBSE_MAXQUED 15046 Max number of jobs in queue
PBSE_CKPBSY 15047 Checkpoint Busy, may be retries
PBSE_EXLIMIT 15048 Limit exceeds allowable
PBSE_BADACCT 15049 Bad Account attribute value
PBSE_ALRDYEXIT 15050 Job already in exit state
PBSE_NOCOPYFILE 15051 Job files not copied
PBSE_CLEANEDOUT 15052 unknown job id after clean init
PBSE_NOSYNCMSTR 15053 No Master in Sync Set
PBSE_BADDEPEND 15054 Invalid dependency
PBSE_DUPLIST 15055 Duplicate entry in List
PBSE_DISPROTO 15056 Bad DIS based Request Protocol
PBSE_EXECTHERE 15057 cannot execute there
PBSE_SISREJECT 15058 sister rejected
PBSE_SISCOMM 15059 sister could not communicate
PBSE_SVRDOWN 15060 requirement rejected -server shutting down
PBSE_CKPSHORT 15061 not all tasks could checkpoint
PBSE_UNKNODE 15062 Named node is not in the list
PBSE_UNKNODEATR 15063 node-attribute not recognized
PBSE_NONODES 15064 Server has no node list
PBSE_NODENBIG 15065 Node name is too big
PBSE_NODEEXIST 15066 Node name already exists
PBSE_BADNDATVAL 15067 Bad node-attribute value
PBSE_MUTUALEX 15068 State values are mutually exclusive
PBSE_GMODERR 15069 Error(s) during global modification of nodes
PBSE_NORELYMOM 15070 could not contact Mom
PBSE_NOTSNODE 15071 no time-shared nodes

See Also