[Mauiusers] Two queues/classes, with one blocking all jobs

Bill Wichser bill at Princeton.EDU
Tue Jan 16 07:40:51 MST 2007


I have a cluster of Linux nodes with two queues/classes.  One is the ib 
queue and the other the default.  Users must specify the #PBS -q ib to 
get into the ib queue otherwise they go into this default queue.

For the sake of example, lets say that there are 64 nodes of IB and 64 
nodes or GigE (default).

I define in the server_priv/nodes file the attribute of either ib or 
noib to the hosts.  In my qmgr I define

set queue default resources_default.neednodes = noib
set queue ib resources_default.neednodes = ib
set queue ib resources_max.nodect = 64

In maui.cfg I have set

CLASSCFG[ib]    MAXPROC=128 MAXNODE=64

Now the problem.

As the ib queue starts running and jobs fill it up, I find that wider 
jobs move into the blocked state, as expected, as resources are not 
available.  This leaves open the situation where narrower jobs still 
being able to fit are scheduled before these wider jobs in the blocked 
state as the resources are available.  These wide jobs continue to just 
sit there never moving to the idle state and thus never run until the 
point when maybe two of the narrower jobs finish at the same time.  Then 
there are enough resources available to run these wider jobs.

Queuetime parameters are included in job priority and seems to be the 
case that these wide jobs, blocked, have higher priority than the narrow 
jobs which are scheduled to run.

Removing the CLASSCFG[ib] line and restarting maui moves these jobs to 
the idle queue and scheduling works as expected.  Except that these ib 
jobs, sitting high atop the idle queue, block all the other jobs waiting 
for the noib nodes.  One can do a "checkjob" and see that the noib jobs, 
the default class/queue, are in a state where the job can run but it is 
never scheduled, waiting for the higher priority side ib jobs to start 
first.

I'm not really sure how to fix the situation.  I'd like the scheduler to 
know that those IB jobs waiting should not block the other jobs from 
running when they have all the resources available to run but I don't 
know how to tell maui that it's okay to let those jobs go.  I imagine, 
but have not tested, that this would work the other way around too, 
where default jobs block the ib jobs from running if the priority of 
those sitting in the idle queue were higher.

Thanks,
Bill


More information about the mauiusers mailing list