[torqueusers] Job Allocation problem

Mark Meenan mjm-www at dcs.gla.ac.uk
Wed Mar 22 02:46:38 MST 2006


Garrick Staples wrote:
> On Tue, Mar 21, 2006 at 10:26:01AM +0000, Mark Meenan alleged:
>> I have come across an interesting problem. A user of the cluster 
>> submitted a couple of hundred jobs to the queue - by default they went 
>> into the feed queue and the resources requested were such that they were 
>> moved to the long queue, which then filled up to the max_queuable limit. 
>> The jobs that remain then were moved to the parallel queue (which is 
>> logical even if I did not anticipate it). However the jobs when they 
> 
> This is handled by using 2 routing queues.  The second acts as a sink
> to hold overflow from the execution queue.
>   feed
>     -> short_route (has short's resources_min/max)
>          -> short (has max_queueable)
>     -> long_route
>          -> long
>     ...etc.
> 
> 
>> went into the parallel queue were allocated 8 nodes - which is not the 
>> behaviour I would have expected. 
> 
> Are you sure they were actually allocated 8 nodes?  Or was that just the
> nodect reported by qstat.  Double check the actual nodes assigned with
> 'qstat -f'.
> 
I had checked this and it was 8 nodes in qstat -f, however your comment
below regarding the max resources being used as the default answere the
question.


> http://www.clusterresources.com/bugzilla/show_bug.cgi?id=77
> 
> The patch was committed to 2.1.0, but should apply fine to 1.2 or 2.0.
> 
> 
>>> set queue long resources_max.nodect = 1
>>> set queue long resources_max.nodes = 1
>>> set queue long resources_max.walltime = 50:00:00
>>> set queue long resources_default.cput = 24:00:00
>>> set queue long resources_default.nodes = 1
>>> set queue long resources_default.walltime = 50:00:00
>> I have since added the following lines of configuration and expect that 
>> this will solve the particular problem, but I would like to know the 
>> reason why the resources allocated was 8 nodes
>>
>>> set queue parallel resources_min.nodect = 2
>>> set queue parallel resources_min.nodes = 2
>>> set queue parallel resources_default.nodect = 2
>>> set queue parallel resources_default.nodes = 2
> 
> "max" resources are used as defaults if the resource isn't already set
> or has a default.  While this may seem confusing, it allows for things
> like "resources_max.vmem" to become limits on MOM.
> 

THis seems to explain it thanks Garrick.


> With the patch above, nodect should be properly computed from "nodes" so
> you don't have to worry about it.  Without the patch, you need
> default.nodect to do everything properly.
> 
> Also, "nodes" is a string, so min/max.nodes doesn't mean anything.
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers




More information about the torqueusers mailing list