[torqueusers] RM Failure - MOM rejected
Garrick Staples
garrick at usc.edu
Mon Mar 20 16:43:54 MST 2006
On Tue, Mar 14, 2006 at 07:49:04AM -0800, Gaurav Chopra alleged:
> Hi
>
> I submitted this job on the cluster and the job is deferred. Using tracejob
> I get:
>
> 03/14/2006 05:06:17 S unable to run job, MOM rejected/rc=1
> _
> Using checkjob $PBS_ID_
>
> job is deferred. Reason: RMFailure (cannot start job - RM failure, rc:
> 15041, msg: 'Execution server rejected request MSG=send failed, STARTING')
> Holds: Defer (hold reason: RMFailure)
> PE: 1.00 StartPriority: 1
> cannot select job 99950 for partition DEFAULT (job hold active)
Check the MOM logs and/or syslog for the cause of the job start error.
Common causes include missing usernames or group names, rcp/scp
misconfiguration, and system date mismatches.
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060320/ca677eba/attachment.bin
More information about the torqueusers
mailing list