Appendix D: Error Codes and Diagnostics
Appendix D: Diagnostics and Error Codes
D.1 TORQUE Diagnostics
TORQUE has a diagnostic script to assist you in giving TORQUE Support the files they need to support issues. It should be run by a
user that has access to run all TORQUE commands and access to all TORQUE
directories (this is usually root).
The script (contrib/diag/tdiag.sh) is available in TORQUE 2.3.8, TORQUE
2.4.3, and later. The script
grabs the nodefile, server and MOM logfiles, and captures the output of
qmgr -c 'p s'. These are put in a tarfile.
The script also has the following options (this can be shown in the command line by
entering ./tdiag.sh -h):
USAGE: ./torque_diag [-d DATE] [-h] [-o OUTPUT_FILE] [-t TORQUE_HOME]
DATE should be in the format YYYYmmdd. For example,
20091130 would be the date for November 30th, 2009.
If no date is specified, today's date is used. OUTPUT_FILE is the optional name of the output file.
The default output file is torque_diag<today's_date>.tar.gz. TORQUE_HOME should be the path to your TORQUE directory. If no directory is specified, /var/spool/torque is the default.
D.2 TORQUE Error Codes
| Error Code Name |
Number |
Description |
| PBSE_NONE |
15000 |
no error |
| PBSE_UNKJOBID |
15001 |
Unknown Job Identifier |
| PBSE_NOATTR |
15002 |
Undefined Attribute |
| PBSE_ATTRRO |
15003 |
attempt to set READ ONLY attribute |
| PBSE_IVALREQ |
15004 |
Invalid request |
| PBSE_UNKREQ |
15005 |
Unknown batch request |
| PBSE_TOOMANY |
15006 |
Too many submit retries |
| PBSE_PERM |
15007 |
No permission |
| PBSE_BADHOST |
15008 |
access from host not allowed |
| PBSE_JOBEXIST |
15009 |
job already exists |
| PBSE_SYSTEM |
15010 |
system error occurred |
| PBSE_INTERNAL |
15011 |
internal server error occurred |
| PBSE_REGROUTE |
15012 |
parent job of dependent in rte queue |
| PBSE_UNKSIG |
15013 |
unknown signal name |
| PBSE_BADATVAL |
15014 |
bad attribute value |
| PBSE_MODATRRUN |
15015 |
Cannot modify attribute in run state |
| PBSE_BADSTATE |
15016 |
request invalid for job state |
| PBSE_UNKQUE |
15018 |
Unknown queue name |
| PBSE_BADCRED |
15019 |
Invalid Credential in request |
| PBSE_EXPIRED |
15020 |
Expired Credential in request |
| PBSE_QUNOENB |
15021 |
Queue not enabled |
| PBSE_QACESS |
15022 |
No access permission for queue |
| PBSE_BADUSER |
15023 |
Bad user - no password entry |
| PBSE_HOPCOUNT |
15024 |
Max hop count exceeded |
| PBSE_QUEEXIST |
15025 |
Queue already exists |
| PBSE_ATTRTYPE |
15026 |
incompatible queue attribute type |
| PBSE_QUEBUSY |
15027 |
Queue Busy (not empty) |
| PBSE_QUENBIG |
15028 |
Queue name too long |
| PBSE_NOSUP |
15029 |
Feature/function not supported |
| PBSE_QUENOEN |
15030 |
Cannot enable queue,needs add def |
| PBSE_PROTOCOL |
15031 |
Protocol (ASN.1) error |
| PBSE_BADATLST |
15032 |
Bad attribute list structure |
| PBSE_NOCONNECTS |
15033 |
No free connections |
| PBSE_NOSERVER |
15034 |
No server to connect to |
| PBSE_UNKRESC |
15035 |
Unknown resource |
| PBSE_EXCQRESC |
15036 |
Job exceeds Queue resource limits |
| PBSE_QUENODFLT |
15037 |
No Default Queue Defined |
| PBSE_NORERUN |
15038 |
Job Not Rerunnable |
| PBSE_ROUTEREJ |
15039 |
Route rejected by all destinations |
| PBSE_ROUTEEXPD |
15040 |
Time in Route Queue Expired |
| PBSE_MOMREJECT |
15041 |
Request to MOM failed |
| PBSE_BADSCRIPT |
15042 |
(qsub) cannot access script file |
| PBSE_STAGEIN |
15043 |
Stage In of files failed |
| PBSE_RESCUNAV |
15044 |
Resources temporarily unavailable |
| PBSE_BADGRP |
15045 |
Bad Group specified |
| PBSE_MAXQUED |
15046 |
Max number of jobs in queue |
| PBSE_CKPBSY |
15047 |
Checkpoint Busy, may be retries |
| PBSE_EXLIMIT |
15048 |
Limit exceeds allowable |
| PBSE_BADACCT |
15049 |
Bad Account attribute value |
| PBSE_ALRDYEXIT |
15050 |
Job already in exit state |
| PBSE_NOCOPYFILE |
15051 |
Job files not copied |
| PBSE_CLEANEDOUT |
15052 |
unknown job id after clean init |
| PBSE_NOSYNCMSTR |
15053 |
No Master in Sync Set |
| PBSE_BADDEPEND |
15054 |
Invalid dependency |
| PBSE_DUPLIST |
15055 |
Duplicate entry in List |
| PBSE_DISPROTO |
15056 |
Bad DIS based Request Protocol |
| PBSE_EXECTHERE |
15057 |
cannot execute there |
| PBSE_SISREJECT |
15058 |
sister rejected |
| PBSE_SISCOMM |
15059 |
sister could not communicate |
| PBSE_SVRDOWN |
15060 |
requirement rejected -server shutting down |
| PBSE_CKPSHORT |
15061 |
not all tasks could checkpoint |
| PBSE_UNKNODE |
15062 |
Named node is not in the list |
| PBSE_UNKNODEATR |
15063 |
node-attribute not recognized |
| PBSE_NONODES |
15064 |
Server has no node list |
| PBSE_NODENBIG |
15065 |
Node name is too big |
| PBSE_NODEEXIST |
15066 |
Node name already exists |
| PBSE_BADNDATVAL |
15067 |
Bad node-attribute value |
| PBSE_MUTUALEX |
15068 |
State values are mutually exclusive |
| PBSE_GMODERR |
15069 |
Error(s) during global modification of nodes |
| PBSE_NORELYMOM |
15070 |
could not contact Mom |
| PBSE_NOTSNODE |
15071 |
no time-shared nodes |
|