[torqueusers] RM Failure

David Jackson jacksond at clusterresources.com
Fri Mar 4 13:46:43 MST 2005


Bobby,

  What TORQUE release are you on?  This indicates that TORQUE is
attempting to start the job but when it does so, the MOM reports that
the jobs already exists and is running locally.  If this is in fact the
issue, this may already be resolvved in the latest release of TORQUE.

Dave

On Fri, 2005-03-04 at 13:45 -0600, Bobby Brown wrote:
> We started seeing jobs that are blocked when there are plenty of free 
> nodes and a checkjob reveals:
> 
> Messages:  cannot start job - RM failure, rc: 15041, msg: 'MSG=send 
> failed, JOB_SUBSTATE_RUNNING' PE:  1.00 StartPriority: 6234
> 
> Any ideas?
> 
> Thanks
> Bobby
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list