[torqueusers] Torque configuration questions for multi-core nodes

Woods, David M. Dr. woodsdm2 at muohio.edu
Wed Mar 31 10:39:30 MDT 2010


We are moving from a cluster with dual CPU nodes to one that has 8 cores per node and are having trouble getting queues to work the way we would like.

Our user load is a mix of some medium to large (8 - 64 core) parallel jobs, some long running serial jobs (50+ hours) and a lot of short (under 10 hour) serial jobs.

My initial  plan was to have a routing queue that sent jobs to either a parallel queue (anything requesting more than one core) or serial queue.  I was then going to set the max number of running jobs to allow a lot of serial jobs and a small number of parallel jobs.

To do this, I set the serial queue with:
resources_max.nodect=1
resources_max_ncpus=1
And no resources_min settings

For the parallel queue, I set
resources_min.ncpus=2

In the routing queue, the serial queue is listed before the parallel queue.

What I see is that jobs requests like "-l nodes=1:ppn=2" are routed to the serial queue and execute successfully.  If I reverse the order of the queues for the routing queue, all jobs are sent to the parallel queue.  The problem I see with having single node but multicore jobs route to the serial queue is that if a user is allowed to have 50 jobs running, that is OK if they are all single core, but not if they all use 8 cores.

Using job requests like "-l ncpus=2" routes correctly, but only works for jobs needing 8 or fewer cores.  Asking the users to know to switch to the "nodes=x:ppn=y" format for larger jobs would probably be confusing, especially since they are used to the "nodes=x:ppn=y" format on our current cluster.

>From what I can the "nodect" value in Torque is the nodes part of "nodes=x:ppn=y" and when this request format is used, there is no value set for "ncpus".  I don't see that anything like nodes*ppn is calculated for use in resource decisions.

I'm about to conclude that Torque can't handle this, and have started looking at what I can do with Maui, but thought I'd see if anyone had suggestions on how to do this (or something similar) with Torque or Maui.

Dave Woods

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100331/d1c54d81/attachment-0001.html 


More information about the torqueusers mailing list