[torqueusers] Torque/Maui kills jobs running on the same node

Josh Bernstein jbernstein at penguincomputing.com
Fri Feb 19 10:06:40 MST 2010


Hi Evgeni,

Is it possible you're running Maui and PBS_sched at the same time by  
mistake?

-Josh

On Feb 19, 2010, at 7:49 AM, "Jerry Smith" <jdsmit at sandia.gov> wrote:

> Evgeni,
>
> Are you doing any process cleanup in the epilogue?  If so you may be
> killing all of that user's jobs when the first job exits.
>
> --Jerry
>
>
> Evgeni Bezus wrote:
>> Hi all,
>>
>> We are running Maui and Torque on a 14-node cluster. Each node has  
>> 8 cores
>> (2 4-core processors). When running two (or more) jobs from a single
>> user on the same node, Maui(or Torque?) stops all the jobs when one  
>> of them is
>> finished. The finished job has Exit_status=0, killed jobs -
>> Exit_status=271. The value of the NODEACCESSPOLICY parameter in
>> maui.cfg is SHARED. This problem does not occur when running jobs  
>> from
>> a single user on different nodes or when running jobs from different
>> users on the same node.
>>
>> Does anyone know how to resolve the problem?
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
>>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list