[Mauiusers] Re: [torqueusers] Maui and Torque not agreeing on jobstate

Philip Peartree P.Peartree at postgrad.manchester.ac.uk
Wed Sep 24 09:15:02 MDT 2008

As an addition to the last email... it seems it's only multinode jobs  
that are getting stuck, but allow multinode jobs are allowed in maui,  
and I can't see a setting in torque for that!

Quoting "Craig West" <cwest at astro.umass.edu>:

> Philip,
> I think you will find the job or a node has an error. It is being
> continuously restarted. Notice the start_count variable is high, also
> there is the exit_status variable which doesn't usually appear until
> the job has exited (at least once).
> I think the job is being continuously re-queued. You may want to put a
> hold on it, or delete it until you can understand why.
> I would check the logs on the server to see which nodes it trying to
> run on, then check that node to see if there is a problem.
> "tracejob <jobid>" should show some useful information, but needs to be
> run by root to get detailed information. The "exec_host" variable will
> tell you which nodes it is trying to run on.
> Craig.
>>    etime = Wed Sep 24 14:08:27 2008
>>    exit_status = -3
>>    submit_args = qsubtest.com
>>    start_time = Wed Sep 24 14:08:28 2008
>>    start_count = 1756
>> I don't understand the priority being zero, as maui lists the   
>> startpriority as 60. Something appears to be not communicating   
>> somewhere. Could someone shed some light on it?
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

More information about the mauiusers mailing list