[torqueusers] RM Failure

Bobby Brown bobby.brown at vanderbilt.edu
Fri Mar 4 14:34:57 MST 2005


Opps....1.2.0p1

Bobby Brown wrote:
> Dave,
> 
> We are using torque-1.1p5.  I am compiling 1.2p3 now.  Hopefully this 
> will correct the problem with the MOM restarts.
> 
> Thanks
> Bobby
> 
> David Jackson wrote:
> 
>> Bobby,
>>
>>   What TORQUE release are you on?  This indicates that TORQUE is
>> attempting to start the job but when it does so, the MOM reports that
>> the jobs already exists and is running locally.  If this is in fact the
>> issue, this may already be resolvved in the latest release of TORQUE.
>>
>> Dave
>>
>> On Fri, 2005-03-04 at 13:45 -0600, Bobby Brown wrote:
>>
>>> We started seeing jobs that are blocked when there are plenty of free 
>>> nodes and a checkjob reveals:
>>>
>>> Messages:  cannot start job - RM failure, rc: 15041, msg: 'MSG=send 
>>> failed, JOB_SUBSTATE_RUNNING' PE:  1.00 StartPriority: 6234
>>>
>>> Any ideas?
>>>
>>> Thanks
>>> Bobby
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://supercluster.org/mailman/listinfo/torqueusers
>>
>>
>>
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list