[torqueusers] RM Failure - MOM rejected

Garrick Staples garrick at usc.edu
Mon Mar 20 16:43:54 MST 2006


On Tue, Mar 14, 2006 at 07:49:04AM -0800, Gaurav Chopra alleged:
>  Hi
> 
> I submitted this job on the cluster and the job is deferred. Using tracejob
> I get:
> 
> 03/14/2006 05:06:17  S    unable to run job, MOM rejected/rc=1
> _
> Using checkjob $PBS_ID_
> 
> job is deferred.  Reason:  RMFailure  (cannot start job - RM failure, rc:
> 15041, msg: 'Execution server rejected request MSG=send failed, STARTING')
> Holds:    Defer  (hold reason:  RMFailure)
> PE:  1.00  StartPriority:  1
> cannot select job 99950 for partition DEFAULT (job hold active)

Check the MOM logs and/or syslog for the cause of the job start error.

Common causes include missing usernames or group names, rcp/scp
misconfiguration, and system date mismatches.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060320/ca677eba/attachment.bin


More information about the torqueusers mailing list