[torqueusers] RM Failure - MOM rejected

Gaurav Chopra gauravchopra at gmail.com
Tue Mar 14 08:49:04 MST 2006


I submitted this job on the cluster and the job is deferred. Using tracejob
I get:

03/14/2006 05:06:17  S    unable to run job, MOM rejected/rc=1
Using checkjob $PBS_ID_
StartDate: -00:06:36  Tue Mar 14 05:06:18
Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]

IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 2
PartitionMask: [ALL]
Flags:       RESTARTABLE

job is deferred.  Reason:  RMFailure  (cannot start job - RM failure, rc:
15041, msg: 'Execution server rejected request MSG=send failed, STARTING')
Holds:    Defer  (hold reason:  RMFailure)
PE:  1.00  StartPriority:  1
cannot select job 99950 for partition DEFAULT (job hold active)

Please advice

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20060314/84cf09a9/attachment.html

More information about the torqueusers mailing list