[Mauiusers] Re: maui not schedulling jobs in avaliable resources

Arnau Bria arnaubria at pic.es
Sat Feb 21 11:05:34 MST 2009


On Sat, 21 Feb 2009 00:51:31 +0100
Roy Dragseth wrote:

> Hi.
Hi Roy,
 
> Your posting gave me the clue on how to reproduce this problem on my
> test cluster, I've been ripping hairs out of my head trying to figure
> out why I could not reproduce the same behaviour as we see on our
> production cluster.
I'm glad to hear that I'm not the only one trying to understand this
problem, I don't feel alone  :-) 

> It seems like maui stops parsing through the remaining idle jobs as
> soon as it finds a job that cannot run, possibly on a per user and/or
> class basis, I don't know.

Well, I can have one idle job in the top of the queue, and mau still
parses other jobs... but if I have 2 jobs from same class in top, maui
stops there (just play with maxijob=1 or 2 ). 
But if I have 2 jobs from different classes that can not run, maui
stills parses the queue. So seems that it stops after finding 2 jobs
from same class that cannot run.

 
> If you comment out line 905 in src/moab/MQueue.c that reads
> 
>  IdleJobFound = TRUE;
> 
> it seems to continue parsing the idle jobs until the end.  The
> IdleJobFound = TRUE is at another place too, but I cannot tell when
> that kicks in.

So, it could happen in other circumstances... If I comment out the
line, and I use it in production, I'm sure that I'll find it quickly...
Murphy leaves in our hosts :-)))
  

> Maybe we could make this configurable?  One can envision scenarios
> where the current behaviour is reasonable, for instance where you
> have thousands of jobs in the queue and they all have the same
> specs.  If you want to continue parsing the idle queue you should
> probably  set the MAXIJOB to limit the jobs to parse per scheduling
> cycle.
Is there any dev here that could confirm this and explain what is the
other scenraio where this problem could appear? Any other way of
solving this without commenting code lines and without setting idle
jobs limit? It's dangerous cause we need many sched cylcles "to full" an
empty queue.
  
> r.
Thanks Roy, we're near to the final solution.
Cheers,
Arnau


More information about the mauiusers mailing list