[torqueusers] Job Allocation problem
Mark Meenan
mjm-www at dcs.gla.ac.uk
Wed Mar 22 02:46:38 MST 2006
Garrick Staples wrote:
> On Tue, Mar 21, 2006 at 10:26:01AM +0000, Mark Meenan alleged:
>> I have come across an interesting problem. A user of the cluster
>> submitted a couple of hundred jobs to the queue - by default they went
>> into the feed queue and the resources requested were such that they were
>> moved to the long queue, which then filled up to the max_queuable limit.
>> The jobs that remain then were moved to the parallel queue (which is
>> logical even if I did not anticipate it). However the jobs when they
>
> This is handled by using 2 routing queues. The second acts as a sink
> to hold overflow from the execution queue.
> feed
> -> short_route (has short's resources_min/max)
> -> short (has max_queueable)
> -> long_route
> -> long
> ...etc.
>
>
>> went into the parallel queue were allocated 8 nodes - which is not the
>> behaviour I would have expected.
>
> Are you sure they were actually allocated 8 nodes? Or was that just the
> nodect reported by qstat. Double check the actual nodes assigned with
> 'qstat -f'.
>
I had checked this and it was 8 nodes in qstat -f, however your comment
below regarding the max resources being used as the default answere the
question.
> http://www.clusterresources.com/bugzilla/show_bug.cgi?id=77
>
> The patch was committed to 2.1.0, but should apply fine to 1.2 or 2.0.
>
>
>>> set queue long resources_max.nodect = 1
>>> set queue long resources_max.nodes = 1
>>> set queue long resources_max.walltime = 50:00:00
>>> set queue long resources_default.cput = 24:00:00
>>> set queue long resources_default.nodes = 1
>>> set queue long resources_default.walltime = 50:00:00
>> I have since added the following lines of configuration and expect that
>> this will solve the particular problem, but I would like to know the
>> reason why the resources allocated was 8 nodes
>>
>>> set queue parallel resources_min.nodect = 2
>>> set queue parallel resources_min.nodes = 2
>>> set queue parallel resources_default.nodect = 2
>>> set queue parallel resources_default.nodes = 2
>
> "max" resources are used as defaults if the resource isn't already set
> or has a default. While this may seem confusing, it allows for things
> like "resources_max.vmem" to become limits on MOM.
>
THis seems to explain it thanks Garrick.
> With the patch above, nodect should be properly computed from "nodes" so
> you don't have to worry about it. Without the patch, you need
> default.nodect to do everything properly.
>
> Also, "nodes" is a string, so min/max.nodes doesn't mean anything.
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list