[Mauiusers] maui not scheduling when no resources avaliable

Jan Ploski Jan.Ploski at offis.de
Thu Dec 13 05:59:59 MST 2007


mauiusers-bounces at supercluster.org schrieb am 12/13/2007 01:05:06 PM:

... 
> Something similar happened when requesting hosst with "slc3 && slc4",
> no nodes fit that condition and maui got hanged....
> 
> So, is it a bug?¿ Is anyone having same problem ? any workaround? 

I once had the same kind of problem - a job stuck in the front of the 
queue preventing other jobs from executing even though checkjob reported 
"can run" for them. In my case, it was due to an inconsistency between 
Maui's and TORQUE's view of the available resources - Maui was trying to 
assign a job to an already occupied resource - because it thought jobs 
running there each use 0 processors, TORQUE was rejecting these attempts.

Maybe the output of diagnose -n <name of the offline node>, diagnose -j 
<job id>, diagnose -r will provide additional clues?

Another thing that you might try is setting the node 'down' (kill pbs_mom 
on it) rather than 'offline' to see if it changes anything.

Best regards,
Jan Ploski


More information about the mauiusers mailing list