[torquedev] Core Scheduling Enhancements
dbeer at adaptivecomputing.com
Fri Feb 12 11:22:09 MST 2010
As you all know, in version 2.4 we added support for requesting a specific geometry for a job within a node using -l procs_bitmap. It seems that this feature represents some progress, but it doesn't really allow people to fully take advantage of the latest hardware and layouts within new supercomputers and clusters. Some systems no longer use many different nodes, but instead have one enormous node with lots of shared memory and many, many processors. Other systems have nodes that are made up of groups of processors that can communicate quickly within the group, but less quickly among the groups. We are also seeing and will likely see more jobs that benefit from running on processors that are close together, as well as jobs where some of the processors need to be grouped and others don't.
We want TORQUE to be able to handle all of these needs, as well as be in a position to adapt well as computing needs change. To meet that goal, we want to move forward to an updated method of specifying what resources a job needs, and our main goals in doing so are to remove ambiguity and increase flexibility. I'm sure that some of you have experience with alternatives to TORQUE and have good insight as to how we can address these needs, and we're interested in your input as to how best meet these needs. One thing of special interest is assigning specific meaning to procs, cores, nodes, sockets, or other relevant terms for defining job resources. Please let us know your thoughts on how best to solve this problem.
Thanks in advance,
David Beer | Senior Software Engineer
More information about the torquedev