[torqueusers] TORQUE 4.0 Officially Announced

David Beer dbeer at adaptivecomputing.com
Mon Mar 19 09:54:04 MDT 2012


Hwloc is now required for running cpusets in TORQUE, and it helps out a lot
both in immediate use and in groundwork for future features.

Immediately hwloc gives you a better cpuset because it gives you the next
core instead of the next indexed core. For example: many eight core systems
have processors 0, 2, 4, and 6 next to each other and processors 1, 3,  5,
and 7 next to each other. If you're running a pre-4.0 TORQUE, and you have
two jobs on the node, each with 4 cores, job 1 will have 0-3 and job 2 will
have 4-7. In TORQUE 4.0, job 1 will have 0, 2, 4, and 6, and job 2 will
have 1, 3, 5, and 7. This should help speed up processing times for jobs
(NOTE: only if you have this kind of system and a comparable job layout,
I'm not promising a general speed-up to everyone using cpusets). This
should also allow us to properly handle hyperthreading for anyone that has
it turned on and wishes to use it.

The last immediate feature is if you have SMT (simultaneous
multi-threading) hardware. The mom config variable $use_smt was added. By
default, the use of SMT is enabled, but you can tell your pbs_mom to ignore
them (not place them in the cpuset) using by adding

$use_smt false

to your mom config file

For the future, the hwloc threads make it really easy for us to handle
hardware specific requests. One of the coming features for TORQUE is to
allow requests roughly similar to:

socket=2:numa=2 --with-hyperthreads

which would say to spread the job over 2 sockets, and across the 2 numa
nodes on each socket. This is a feature we plan to add to improve support
for Magny-Cours and Opteron type processors that have multiple sockets and
or multiple numa nodes on the processor chip. Using hwloc makes it so we
don't have to parse system files and map the indices to the sockets and/or
numa nodes ourselves, we can simply use easy hwloc functions
like hwloc_get_next_obj_inside_cpuset_by_type() that allow you to just move
on to the next physical core or virtual core, or skip to the next socket or
numa node as the case may be.


On Mon, Mar 19, 2012 at 8:47 AM, DuChene, StevenX A <
stevenx.a.duchene at intel.com> wrote:

> Also a better (more complete) explanation of what features are enabled
> when hwloc is used would be helpful as well.
> BTW, I built torque on my server without hwloc installed and then
> installed the resulting mom packages on my nodes. The mom daemons in that
> case did seem to start up just fine.
> --
> Steven DuChene
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org [mailto:
> torqueusers-bounces at supercluster.org] On Behalf Of Craig West
> Sent: Sunday, March 18, 2012 10:40 PM
> To: Torque Users mailing list; Torque Developers mailing list
> Subject: Re: [torqueusers] TORQUE 4.0 Officially Announced
> Hi Steven,
> I have just begun testing Torque 4.0, as hwloc has been a long awaited
> feature for me.
> > It is unclear from this announcement text where hwloc has to be
> installed.
> > Is it just on the server or on the nodes only?
> It needs to be available on the BUILD server and the nodes. I tried to
> run pbs_mom on a node without the hwloc I had installed and it failed.
> Note: I am running hwloc 1.4 from a directory in /usr/local
> This was not automatically found by the TORQUE configure script, but you
> can specify the location using HWLOC_CFLAGS & HWLOC_LIBS.
> It embeds the locations that you specify in the pbs_mom (and other
> files) but it seems you can set the LD_LIBRARY_PATH variable if it is
> not in the same location on the BUILD server as the compute nodes.
> For simplicity installing them in the same location makes sense.
> > More documentation about this would be greatly appreciated.
> I agree, clearer and more detailed documentation would be useful.
> Cheers,
> Craig.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

David Beer | Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120319/ad5d6a7a/attachment-0001.html 

More information about the torqueusers mailing list