[torqueusers] qsub on several nodes
glen.beane at gmail.com
Tue Jun 1 13:50:21 MDT 2010
On Tue, Jun 1, 2010 at 2:56 PM, Felix Werner <ff.werner at gmail.com> wrote:
> 2010/6/1 Glen Beane <glen.beane at gmail.com>
>> On Tue, Jun 1, 2010 at 12:16 PM, Ken Nielson
>> <knielson at adaptivecomputing.com> wrote:
>> > On 06/01/2010 10:08 AM, Felix Werner wrote:
>> >> Dear all,
>> >> Suppose I want to run a job on 40 CPUs (with MPI),
>> >> and there are
>> >> 10 CPUs available on the node "node1"
>> >> 10 on "node2"
>> >> 20 on "node3".
>> >> What I do is:
>> >> qsub -l nodes=node1:ppn=10+node2:ppn=10+node3:ppn=20 shell_name.sh
>> >> This is tedious because I need to look manually how many CPUs are
>> >> available on which node.
>> >> So is there a way to just tell the queing system "I want to run on 40
>> >> CPUs, on whatever nodes"?
>> >> Many thanks!
>> >> Felix Werner
>> > Felix,
>> > If you use a scheduler like Moab you can simply use qsub -l nodes=40 and
>> > it will take care of where they are going to run. But if you are going
>> > to run jobs manually this is how it has to be done.
>> one thing to note is that if you have fewer than 40 nodes you need to
>> trick TORQUE into thinking you have more nodes than it really has so
>> that it doesn't reject a request like -l nodes=40. Moab treats a
>> nodes=X request without a ppn=Y component as a request stating "I just
>> need X processors".
>> You can also do -l procs=40, which doesn't require configuring TORQUE
>> to think it has more nodes than it actually has. This is only
>> supported with Moab.
> Many thanks guys!
> I am not sure yet that it works perfectly though:
> After executing:
> [werner at cm64N TEST]$ qsub -l procs=3 shell_mc.sh
> I get:
> [werner at cm64N TEST]$ qstat -n1
> Req'd Req'd Elap
> Job ID Username Queue Jobname SessID NDS TSK
> Memory Time S Time
> -------------------- -------- -------- ---------------- ------ ----- ---
> ------ ----- - -----
> 4229.cm64n.physi werner batch shell_mc.sh 21133 9 --
> -- -- R 02:51
> 4237.cm64n.physi werner batch shell_mc.sh 21516 1 --
> -- -- R -- cm47/0
> So I guess this means that our sys admin indeed installed Moab..
to find out if your system has moab try running one of the moab
commands like "mdiag -n"
for your job 4237 I would expect to see something like
cm47/0+cm47/1+cm47/2. I think maybe you are using a scheduler that
does not support -l procs and whatever scheduler you are using is
running your job on a single processor.
More information about the torqueusers