[torqueusers] qsub on several nodes
ff.werner at gmail.com
Tue Jun 1 12:56:37 MDT 2010
2010/6/1 Glen Beane <glen.beane at gmail.com>
> On Tue, Jun 1, 2010 at 12:16 PM, Ken Nielson
> <knielson at adaptivecomputing.com> wrote:
> > On 06/01/2010 10:08 AM, Felix Werner wrote:
> >> Dear all,
> >> Suppose I want to run a job on 40 CPUs (with MPI),
> >> and there are
> >> 10 CPUs available on the node "node1"
> >> 10 on "node2"
> >> 20 on "node3".
> >> What I do is:
> >> qsub -l nodes=node1:ppn=10+node2:ppn=10+node3:ppn=20 shell_name.sh
> >> This is tedious because I need to look manually how many CPUs are
> >> available on which node.
> >> So is there a way to just tell the queing system "I want to run on 40
> >> CPUs, on whatever nodes"?
> >> Many thanks!
> >> Felix Werner
> > Felix,
> > If you use a scheduler like Moab you can simply use qsub -l nodes=40 and
> > it will take care of where they are going to run. But if you are going
> > to run jobs manually this is how it has to be done.
> one thing to note is that if you have fewer than 40 nodes you need to
> trick TORQUE into thinking you have more nodes than it really has so
> that it doesn't reject a request like -l nodes=40. Moab treats a
> nodes=X request without a ppn=Y component as a request stating "I just
> need X processors".
> You can also do -l procs=40, which doesn't require configuring TORQUE
> to think it has more nodes than it actually has. This is only
> supported with Moab.
Many thanks guys!
I am not sure yet that it works perfectly though:
[werner at cm64N TEST]$ qsub -l procs=3 shell_mc.sh
[werner at cm64N TEST]$ qstat -n1
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK
Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- ---
------ ----- - -----
4229.cm64n.physi werner batch shell_mc.sh 21133 9 --
-- -- R 02:51
4237.cm64n.physi werner batch shell_mc.sh 21516 1 --
-- -- R -- cm47/0
So I guess this means that our sys admin indeed installed Moab..
Now, as you can see, the job which I submitted on 3 processors using your
trick shows up (on the last line above) as if it was running only on one
According to our "cluster report" website, it is actually running on 3
processors, but I am wondering whether everything is OK (e.g., whether the
queing system will forbid to other jobs to run on the same processors, as it
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers