[torqueusers] Torque configuration questions for multi-core nodes
Woods, David M. Dr.
woodsdm2 at muohio.edu
Wed Mar 31 10:39:30 MDT 2010
We are moving from a cluster with dual CPU nodes to one that has 8 cores per node and are having trouble getting queues to work the way we would like.
Our user load is a mix of some medium to large (8 - 64 core) parallel jobs, some long running serial jobs (50+ hours) and a lot of short (under 10 hour) serial jobs.
My initial plan was to have a routing queue that sent jobs to either a parallel queue (anything requesting more than one core) or serial queue. I was then going to set the max number of running jobs to allow a lot of serial jobs and a small number of parallel jobs.
To do this, I set the serial queue with:
And no resources_min settings
For the parallel queue, I set
In the routing queue, the serial queue is listed before the parallel queue.
What I see is that jobs requests like "-l nodes=1:ppn=2" are routed to the serial queue and execute successfully. If I reverse the order of the queues for the routing queue, all jobs are sent to the parallel queue. The problem I see with having single node but multicore jobs route to the serial queue is that if a user is allowed to have 50 jobs running, that is OK if they are all single core, but not if they all use 8 cores.
Using job requests like "-l ncpus=2" routes correctly, but only works for jobs needing 8 or fewer cores. Asking the users to know to switch to the "nodes=x:ppn=y" format for larger jobs would probably be confusing, especially since they are used to the "nodes=x:ppn=y" format on our current cluster.
>From what I can the "nodect" value in Torque is the nodes part of "nodes=x:ppn=y" and when this request format is used, there is no value set for "ncpus". I don't see that anything like nodes*ppn is calculated for use in resource decisions.
I'm about to conclude that Torque can't handle this, and have started looking at what I can do with Maui, but thought I'd see if anyone had suggestions on how to do this (or something similar) with Torque or Maui.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers