[torqueusers] specific nodes

Gustavo Correa gus at ldeo.columbia.edu
Wed Nov 30 13:37:12 MST 2011


Hi Ricardo

We do something along these lines here with Maui and Torque.

In the Torque $Torque/server_priv/nodes file, add a distinctive 'property' to
each type of node.  Something like this, I call them here PS3, CPUGPU and CPUONLY 
[make the appropriate changes to your reality]:

an_ibm_node np=8 PS3
...
a_gpu_node np=16 CPUGPU  
...
a_cpu_only_node np=4 CPUONLY

[There may be an additional item on the CPUGPU line for Torque gpu control, 
maybe 'gpus=2' or something like that.  
I don't have GPU nodes, 
check the Torque Admin guide.]

With Torque qmgr create the queues you want (IBMCELL, TESLA, XEON),  and set 
the default nodes on each:

set queue IBMCELL resources_default.neednodes = PS3
set queue TESLA resources_default.neednodes = CPUGPU
set queue XEON resources_default.neednodes = CPUONLY

Add this line to your $Maui/maui.cfg

ENABLEMULTIREQJOBS   TRUE

Restart the pbs_server, and maui.

We use this to separate development and production nodes [not for gpu]
on a per-queue basis.
The user needs only to indicate the queue and the number of nodes and ppn he/she
wants [possibly the number of gpus also in your case] in the Torque submission script.
No need to mention the node properties in the submission script.

It works for me.

Documentation is here:
http://www.adaptivecomputing.com/resources/docs/

I hope this helps,
Gus Correa

On Nov 30, 2011, at 3:08 PM, Ricardo Román Brenes wrote:

> Well I am using torque+maui but even so i cant get the maui to assign the nodes correctly; a job just runs on all nodes not just the ones i want ...
> 
> On Wed, Nov 30, 2011 at 2:01 PM, Lloyd Brown <lloyd_brown at byu.edu> wrote:
> Not so much the wrong mailing list, but the wrong product.  In the end
> Torque is really about resource management, launching jobs, etc., but
> not the decision making.  They happen to include a very basic scheduler
> ("pbs_sched"), but it's very, very basic.  If you want anything more,
> you're going to have to look at Moab or Maui, to use with Torque.  Or
> there are other scheduling systems out there as well, that don't use Torque.
> 
> For such a small/simple cluster, I'd recommend Torque with Maui, but
> you'll have to do some investigation.
> 
> 
> Lloyd Brown
> Systems Administrator
> Fulton Supercomputing Lab
> Brigham Young University
> http://marylou.byu.edu
> 
> 
> 
> On 11/30/2011 12:56 PM, Ricardo Román Brenes wrote:
> > so wrong mailing list huh?
> >
> > sorry to bother
> >
> > thanks for your time
> >
> > On Wed, Nov 30, 2011 at 1:52 PM, Lloyd Brown <lloyd_brown at byu.edu
> > <mailto:lloyd_brown at byu.edu>> wrote:
> >
> >     Ricardo,
> >
> >     Have you seen section 4.1.4 ("Mapping a Queue to a Subset of Resources")
> >     in the Torque documentation?  It might give you some ideas.  However,
> >     the short answer to your question, as seen in that section is this:
> >
> >     > TORQUE does not currently provide a simple mechanism for mapping
> >     queues to nodes. However, schedulers such as Moab and Maui can
> >     provide this functionality.
> >
> >
> >     Lloyd Brown
> >     Systems Administrator
> >     Fulton Supercomputing Lab
> >     Brigham Young University
> >     http://marylou.byu.edu
> >
> >
> >
> >     On 11/30/2011 12:37 PM, Ricardo Román Brenes wrote:
> >     > Hello everyone thanks for the time of reading and the long post :P
> >     >
> >     >
> >     > The question is about multiple queues with Torque:
> >     >
> >     >
> >     > We have here different clusternodes with difrente architectures:
> >     > 4 PS-3
> >     > 3 CPU+GPU
> >     > 2 CPU
> >     >
> >     > and i want to be able to send jobs to each of hte nodes independly
> >     > (using torque). Im guessing that having several queues and that each
> >     > node belonging to a queue in particular and then submittint jobs
> >     to that
> >     > queue will do the trick:
> >     >
> >     > say i got 4 queues
> >     > IBMCELL with the 4 PS-3
> >     > TESLA with the 3 nodes that have GPUs
> >     > XEON with te 5 nodes that have Xeons (which in turn 3 of them have
> >     > teslas :P)
> >     >
> >     > and when i submit a job:
> >     > qsub -q IBMCELL a.pbs
> >     > should run on the PS-3 only, but im not being able to make it work
> >     like
> >     > that.
> >     >
> >     > As a test i made 2 queues in the PS3 pbs_server ("uno" and "dos"):
> >     >
> >     >     #
> >     >     # Create queues and set their attributes.
> >     >     #
> >     >     #
> >     >     # Create and define queue uno
> >     >     #
> >     >     *create queue uno
> >     >     **set queue uno queue_type = Execution
> >     >     **set queue uno acl_host_enable = False
> >     >     **set queue uno acl_hosts = zarate-0+zarate-1
> >     >     **set queue uno enabled = True
> >     >     **set queue uno started = True
> >     >     *#
> >     >     # Create and define queue dos
> >     >     #
> >     >     *create queue dos
> >     >     **set queue dos queue_type = Execution
> >     >     **set queue dos acl_host_enable = **False**
> >     >     **set queue dos acl_hosts = zarate-2+zarate-3
> >     >     **set queue dos enabled = True
> >     >     **set queue dos started = True
> >     >     *#
> >     >     # Set server attributes.
> >     >     #
> >     >     set server scheduling = True
> >     >     set server acl_hosts = zarate-0
> >     >     set server log_events = 511
> >     >     set server mail_from = adm
> >     >     set server scheduler_iteration = 600
> >     >     set server node_check_rate = 150
> >     >     set server tcp_timeout = 6
> >     >     set server next_job_number = 22
> >     >
> >     >
> >     > and i changed the _nodes_ file in the server_priv directory so it is
> >     > like this (zarate are just the hostname :P):
> >     >
> >     >
> >     >     zarate-0 np=2 uno
> >     >     zarate-1 np=2 uno
> >     >     zarate-2 np=2 dos
> >     >     zarate-3 np=2 dos
> >     >
> >     >
> >     >
> >     > but its not working... when i launch a job:
> >     >
> >     >     #PBS -N mpi_hello
> >     >     /usr/local/bin/mpiexec -n 8 /home/rroman/a.out
> >     >
> >     >
> >     > with teh command:
> >     >
> >     > #PBS -N mpi_hello
> >     >
> >     >     /usr/local/bin/mpiexec -n 8 /home/rroman/a.out
> >     >
> >     >
> >     > the output file is:
> >     >
> >     >     zarate-1: hello world from process 2 of 8
> >     >     zarate-2: hello world from process 5 of 8
> >     >     zarate-2: hello world from process 6 of 8
> >     >     zarate-3: hello world from process 0 of 8
> >     >     zarate-3: hello world from process 7 of 8
> >     >     zarate-1: hello world from process 3 of 8
> >     >     zarate-0: hello world from process 4 of 8
> >     >     zarate-3: hello world from process 1 of 8
> >     >
> >     >
> >     >
> >     > And there it shows that the job is running in ALL the nodes instead of
> >     > running only in zarate-0 and zarate-1 as the queue said (according
> >     to me :P)
> >     >
> >     >
> >     >
> >     >
> >     > SO! the question is: is it possible to do waht i want like this?
> >     and if
> >     > so, what am i doing wrong! :P
> >     >
> >     > Thank you Kay!
> >     >
> >     > -ricardo
> >     >
> >     >
> >     >
> >     > _______________________________________________
> >     > torqueusers mailing list
> >     > torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
> >     > http://www.supercluster.org/mailman/listinfo/torqueusers
> >     _______________________________________________
> >     torqueusers mailing list
> >     torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
> >     http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
> >
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list