[torqueusers] RM Failure

Bobby Brown bobby.brown at vanderbilt.edu
Fri Mar 4 14:13:18 MST 2005


Dave,

We are using torque-1.1p5.  I am compiling 1.2p3 now.  Hopefully this 
will correct the problem with the MOM restarts.

Thanks
Bobby

David Jackson wrote:
> Bobby,
> 
>   What TORQUE release are you on?  This indicates that TORQUE is
> attempting to start the job but when it does so, the MOM reports that
> the jobs already exists and is running locally.  If this is in fact the
> issue, this may already be resolvved in the latest release of TORQUE.
> 
> Dave
> 
> On Fri, 2005-03-04 at 13:45 -0600, Bobby Brown wrote:
> 
>>We started seeing jobs that are blocked when there are plenty of free 
>>nodes and a checkjob reveals:
>>
>>Messages:  cannot start job - RM failure, rc: 15041, msg: 'MSG=send 
>>failed, JOB_SUBSTATE_RUNNING' PE:  1.00 StartPriority: 6234
>>
>>Any ideas?
>>
>>Thanks
>>Bobby
>>
>>_______________________________________________
>>torqueusers mailing list
>>torqueusers at supercluster.org
>>http://supercluster.org/mailman/listinfo/torqueusers
> 
> 



More information about the torqueusers mailing list