[torqueusers] Torque configuration questions for multi-core nodes

Coyle, James J [ITACD] jjc at iastate.edu
Thu Apr 1 16:14:29 MDT 2010


  I have 16 core nodes, on a new machine, up from 4 core older machines,
so I have a situation similar to yours.

  I have the serial queue defined as an execution queue separate from the
default routing queue.  Users can either issue qsub -q serial scriptname
or place 
#PBS -q serial 
in their job scripts and they are then forced into serial.

  Users are asked not to submit jobs with ppn > 1 to the serial queue,
and so far none have done so.

Best of luck,
 - Jim C.

 James Coyle, PhD
 High Performance Computing Group     
 115 Durham Center            
 Iowa State Univ.           phone: (515)-294-2099
 Ames, Iowa 50011           web: http://www.public.iastate.edu/~jjc

-----Original Message-----
From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Woods, David M. Dr.
Sent: Thursday, April 01, 2010 1:41 PM
To: Gareth.Williams at csiro.au; torqueusers at supercluster.org
Subject: Re: [torqueusers] Torque configuration questions for multi-core nodes

What I'm trying to do is find a queue setup that meets users needs in terms of access, wait times, etc.  I started with the config on our current cluster since the users have some understanding of how that is setup and works.

At this point I think I do need to move the scheduling into Maui, but wanted to make sure I wasn't missing something obvious in Torque.  As I work through understand Maui, I'm sure I will have questions for the Maui list!


-----Original Message-----
From: Gareth.Williams at csiro.au [mailto:Gareth.Williams at csiro.au] 
Sent: Wednesday, March 31, 2010 6:27 PM
To: Woods, David M. Dr.; torqueusers at supercluster.org
Subject: RE: [torqueusers] Torque configuration questions for multi-core nodes

> __________________________________
> From: Woods, David M. Dr. [mailto:woodsdm2 at muohio.edu] 
> Sent: Thursday, 1 April 2010 3:40 AM

> We are moving from a cluster with dual CPU nodes to one that has 8 cores per node and are having trouble getting queues to work the way we would like.

> Our user load is a mix of some medium to large (8 - 64 core) parallel jobs, some long running serial jobs (50+ hours) and a lot of short (under 10 hour) serial jobs.

> My initial  plan was to have a routing queue that sent jobs to either a parallel queue (anything requesting more than one core) or serial queue.  I was then going to set the max number of running jobs to allow a lot of serial jobs and a small number of parallel jobs.

Hi David,

There was a recent extended discussion on ncpus - the first posting from this month is here:

In summary, ncpus and nodes/ppn are alternative ways of specifying how many cores you want.  This means that torque will not get a uniform view of jobs so your routing queue setup (looking at nodes and/or ncpus) would be hard to get right.

Stepping back for a moment, what are you actually trying to achieve? It's the scheduler that makes decisions about what jobs to start and where to start them.  Having separate queues is not necessary.  Maybe you can get where you want without worrying about queues.  Maybe you have a question for the maui list.


torqueusers mailing list
torqueusers at supercluster.org

More information about the torqueusers mailing list