[Mauiusers] blocked jobs never run
roy.dragseth at cc.uit.no
Wed Nov 24 15:56:24 MST 2010
On Wednesday, November 24, 2010 22:51:27 Bill Wichser wrote:
> I seem to be having deja vu all over again.
> The environment is torque/maui. There are a number of classes (queues)
> set up, each with a limit for MAXPROCS. Lets take a class "A" and set a
> maxprocs limit of 100. The queue is full and new jobs block as they now
> exceed this limit.
> So far so good.
> Further, there is a 50 core job in there waiting with what would be the
> highest priority if it were in the idle state. But it remains blocked.
> Other jobs are also blocked by only want 8 cores. As jobs complete,
> lets say they are 10 core and 8 core jobs, only these blocked 8 core
> jobs ever get scheduled. The 50 core job always remains blocked because
> resources are never available.
> Yes one could use the JOBAGGREGATION time to try and wait for more to
> finish in a timely manner but when this takes maybe 16 hours, it just
> isn't feasible. Basically I have a blocked job which gets starved and
> no way to get it scheduled with the current mix of jobs coming in and
> moving through without manually disabling the "started" in qmgr or
> placing jobs on hold until resources become available.
> Is there some way to handle this that I just don't know or have figured
> out? I hope that I'm just missing something simple.
> Thanks. And for all the USofA folks, Happy Thanksgiving.
Have you looked at the prioritization? Are big jobs given higher priority
that small jobs? Have you allowed the highest priority jobs to have
reservations on nodes?
More information about the mauiusers