[Mauiusers] blocked jobs never run
bill at Princeton.EDU
Wed Nov 24 14:51:27 MST 2010
I seem to be having deja vu all over again.
The environment is torque/maui. There are a number of classes (queues)
set up, each with a limit for MAXPROCS. Lets take a class "A" and set a
maxprocs limit of 100. The queue is full and new jobs block as they now
exceed this limit.
So far so good.
Further, there is a 50 core job in there waiting with what would be the
highest priority if it were in the idle state. But it remains blocked.
Other jobs are also blocked by only want 8 cores. As jobs complete,
lets say they are 10 core and 8 core jobs, only these blocked 8 core
jobs ever get scheduled. The 50 core job always remains blocked because
resources are never available.
Yes one could use the JOBAGGREGATION time to try and wait for more to
finish in a timely manner but when this takes maybe 16 hours, it just
isn't feasible. Basically I have a blocked job which gets starved and
no way to get it scheduled with the current mix of jobs coming in and
moving through without manually disabling the "started" in qmgr or
placing jobs on hold until resources become available.
Is there some way to handle this that I just don't know or have figured
out? I hope that I'm just missing something simple.
Thanks. And for all the USofA folks, Happy Thanksgiving.
More information about the mauiusers