[torqueusers] qsub and resource limits problem.
siegert at sfu.ca
Tue Nov 5 17:18:52 MST 2013
Sorry there is a typo in my email below ...
On Tue, Nov 05, 2013 at 04:13:28PM -0800, Martin Siegert wrote:
> Hi Daniel,
> On Tue, Nov 05, 2013 at 11:01:43AM -0200, Daniel Lopes de Carvalho wrote:
> > Hello Guys,
> > Last week I configured a TORQUE's queue with the following
> > characteristics:
> > create queue longas
> > set queue longas queue_type = Execution
> > set queue longas Priority = 10000
> > set queue longas resources_max.nodes = 1:ppn=12
> > set queue longas resources_default.nodes = 1:ppn=12
> > set queue longas max_user_run = 4
> > set queue longas enabled = True
> > set queue longas started = True
> > And after this, the TORQUE is not working properly when I submit a job
> > to this queue.
> > If I use the command: 'echo "sleep 300" | qsub-q long-l nodes = 1: ppn
> > = 8' to submit a job, the following message appears:
> > qsub: Job exceeds queue resource limits MSG = can not satisfy queue max
> > nodes requirement
> > However, if I use the same command, but adding a 0 in front of 8, the
> > submission normally happens: 'echo "sleep 300" | qsub-q long-l nodes =
> > 1: ppn = 08'
> > Is there a possibility to fix this and make the TORQUE accepts the
> > first line ('echo "sleep 300" | qsub-q long-l nodes = 1: ppn = 8')?
> > Thanks and best regards
> The nodes resource is tricky. What is larger: 1:ppn=12 or 2:ppn=5, or ...?
> As far as I remember the nodes resource is stored as a string and the
> a lexical string comparison is used as metric. As a consequence
> 1:ppn=12 is actually smaller than 1:ppn=8, whereas 1:ppn=12 is larger
> than 1:ppn=08. Basically resources_max.nodes and resources_min.nodes
> should not be used at all - the results are almost unpredictable.
> There are two other resources that are derived from the nodes
> resource: nodect and procct. I believe that you can accomplish
> what you want with setting:
> set queue longas resources_max.nodect = 1
> set queue longas resources_max.procct = 12
> nodect counts the number of nodes allocated to the job whereas
> procct counts the number of cores allocated to a job, i.e.,
> for a specification nodes=n:ppn=m, nodect=m and procct=m*n.
This should be:
for a specification nodes=n:ppn=m, nodect=n and procct=m*n.
> For jobs that are submitted with -l procs=x instead of -l nodes=...
> procct is set to x.
> Martin Siegert
> WestGrid/ComputeCanada Site Lead
> IT Services
> Simon Fraser University
> Burnaby, British Columbia, Canada
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers