[torqueusers] TORQUE 4.0 and hwloc

David Beer dbeer at adaptivecomputing.com
Wed Apr 4 09:52:42 MDT 2012


On Wed, Apr 4, 2012 at 9:50 AM, Gus Correa <gus at ldeo.columbia.edu> wrote:

> Hi David
>
> Not to hijack Steven's thread ...
> ... but just taking a quick ride on it ... :)
>
> Does the hwloc 1.1 requirement apply only to Torque 4.0?
> How about the older Torque series [2.X.Y, 3.X.Y]
> that use cpuset?
> [I am in the process of building 2.4.16 with cpuset.]
>
>
This only applies to 4.0 and higher.



> Thank you,
> Gus Correa
>
> On 04/04/2012 10:59 AM, David Beer wrote:
> > Steven,
> >
> > I was supposed to add that note and I forgot - my mistake and thanks for
> > catching it. I have now added:
> >
> > *** For admins that use cpusets in any form ***
> > hwloc version 1.1 or greater is now required for building TORQUE with
> > cpusets, as pbs_mom now uses the
> > hwloc API to create the cpusets instead of creating them manually.
> >
> > to README.building_40.
> >
> > As far as checking for the existence of the library, this does happen at
> > configure time once the configure script determines that the user is
> > going to be using cpusets in any way, which a few different configure
> > options can trigger.
> >
> > David
> >
> > On Tue, Apr 3, 2012 at 8:15 PM, DuChene, StevenX A
> > <stevenx.a.duchene at intel.com <mailto:stevenx.a.duchene at intel.com>>
> wrote:
> >
> >     I installed hwloc-1.4.1 and hwloc-devel-1.4.1 rpms on the server
> >     where I am building torque-4.X and in looking through the output
> >     from the configure script during the build I do not see anywhere
> >     that the existence of any hwloc stuff is checked. In fact in
> >     grepping through the output from the whole torque rpm build process
> >     I do not see ANY mention of hwloc at all.____
> >
> >     __ __
> >
> >     I see compile time flags of HWLOC_CFLAGS and HWLOC_LIBS mentioned in
> >     the –help output from configure but according to the description
> >     text this is just supposed to over-ride the pkg-config results
> >     however I do not see any evidence that the pkg-config system is
> >     being quizzed at all for the existence of hwloc on the build
> server.____
> >
> >     __ __
> >
> >     Is there some step I am missing?____
> >
> >     __ __
> >
> >     I thought someone mentioned that there would be better documentation
> >     of the hwloc business in the torque-4.0.1 release?____
> >
> >     __ __
> >
> >     If so where is it?____
> >
> >     --____
> >
> >     Steven DuChene____
> >
> >     __ __
> >
> >     *From:*torqueusers-bounces at supercluster.org
> >     <mailto:torqueusers-bounces at supercluster.org>
> >     [mailto:torqueusers-bounces at supercluster.org
> >     <mailto:torqueusers-bounces at supercluster.org>] *On Behalf Of *David
> Beer
> >     *Sent:* Monday, March 19, 2012 8:54 AM
> >     *To:* Torque Users Mailing List
> >     *Subject:* Re: [torqueusers] TORQUE 4.0 Officially Announced____
> >
> >     __ __
> >
> >     Steve,____
> >
> >     __ __
> >
> >     Hwloc is now required for running cpusets in TORQUE, and it helps
> >     out a lot both in immediate use and in groundwork for future
> >     features.____
> >
> >     __ __
> >
> >     Immediately hwloc gives you a better cpuset because it gives you the
> >     next core instead of the next indexed core. For example: many eight
> >     core systems have processors 0, 2, 4, and 6 next to each other and
> >     processors 1, 3,  5, and 7 next to each other. If you're running a
> >     pre-4.0 TORQUE, and you have two jobs on the node, each with 4
> >     cores, job 1 will have 0-3 and job 2 will have 4-7. In TORQUE 4.0,
> >     job 1 will have 0, 2, 4, and 6, and job 2 will have 1, 3, 5, and 7.
> >     This should help speed up processing times for jobs (NOTE: only if
> >     you have this kind of system and a comparable job layout, I'm not
> >     promising a general speed-up to everyone using cpusets). This should
> >     also allow us to properly handle hyperthreading for anyone that has
> >     it turned on and wishes to use it.____
> >
> >     __ __
> >
> >     The last immediate feature is if you have SMT (simultaneous
> >     multi-threading) hardware. The mom config variable $use_smt was
> >     added. By default, the use of SMT is enabled, but you can tell your
> >     pbs_mom to ignore them (not place them in the cpuset) using by
> >     adding____
> >
> >     __ __
> >
> >     $use_smt false____
> >
> >     __ __
> >
> >     to your mom config file____
> >
> >     __ __
> >
> >     For the future, the hwloc threads make it really easy for us to
> >     handle hardware specific requests. One of the coming features for
> >     TORQUE is to allow requests roughly similar to:____
> >
> >     __ __
> >
> >     socket=2:numa=2 --with-hyperthreads____
> >
> >     __ __
> >
> >     which would say to spread the job over 2 sockets, and across the 2
> >     numa nodes on each socket. This is a feature we plan to add to
> >     improve support for Magny-Cours and Opteron type processors that
> >     have multiple sockets and or multiple numa nodes on the processor
> >     chip. Using hwloc makes it so we don't have to parse system files
> >     and map the indices to the sockets and/or numa nodes ourselves, we
> >     can simply use easy hwloc functions
> >     like hwloc_get_next_obj_inside_cpuset_by_type() that allow you to
> >     just move on to the next physical core or virtual core, or skip to
> >     the next socket or numa node as the case may be.____
> >
> >     __ __
> >
> >     David____
> >
> >     On Mon, Mar 19, 2012 at 8:47 AM, DuChene, StevenX A
> >     <stevenx.a.duchene at intel.com <mailto:stevenx.a.duchene at intel.com>>
> >     wrote:____
> >
> >     Also a better (more complete) explanation of what features are
> >     enabled when hwloc is used would be helpful as well.
> >
> >     BTW, I built torque on my server without hwloc installed and then
> >     installed the resulting mom packages on my nodes. The mom daemons in
> >     that case did seem to start up just fine.
> >     --
> >     Steven DuChene____
> >
> >
> >     -----Original Message-----
> >     From: torqueusers-bounces at supercluster.org
> >     <mailto:torqueusers-bounces at supercluster.org>
> >     [mailto:torqueusers-bounces at supercluster.org
> >     <mailto:torqueusers-bounces at supercluster.org>] On Behalf Of Craig
> West
> >     Sent: Sunday, March 18, 2012 10:40 PM
> >     To: Torque Users mailing list; Torque Developers mailing list____
> >
> >     Subject: Re: [torqueusers] TORQUE 4.0 Officially Announced
> >
> >
> >     Hi Steven,
> >
> >     I have just begun testing Torque 4.0, as hwloc has been a long
> awaited
> >     feature for me.
> >
> >      > It is unclear from this announcement text where hwloc has to be
> >     installed.
> >      > Is it just on the server or on the nodes only?
> >
> >     It needs to be available on the BUILD server and the nodes. I tried
> to
> >     run pbs_mom on a node without the hwloc I had installed and it
> failed.
> >
> >     Note: I am running hwloc 1.4 from a directory in /usr/local
> >     This was not automatically found by the TORQUE configure script, but
> you
> >     can specify the location using HWLOC_CFLAGS & HWLOC_LIBS.
> >     It embeds the locations that you specify in the pbs_mom (and other
> >     files) but it seems you can set the LD_LIBRARY_PATH variable if it is
> >     not in the same location on the BUILD server as the compute nodes.
> >     For simplicity installing them in the same location makes sense.
> >
> >      > More documentation about this would be greatly appreciated.
> >
> >     I agree, clearer and more detailed documentation would be useful.
> >
> >     Cheers,
> >     Craig.
> >     _______________________________________________
> >     torqueusers mailing list
> >     torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
> >     http://www.supercluster.org/mailman/listinfo/torqueusers
> >     _______________________________________________
> >     torqueusers mailing list
> >     torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
> >     http://www.supercluster.org/mailman/listinfo/torqueusers____
> >
> >
> >
> >     ____
> >
> >     __ __
> >
> >     -- ____
> >
> >     David Beer | Software Engineer____
> >
> >     Adaptive Computing____
> >
> >     __ __
> >
> >
> >     _______________________________________________
> >     torqueusers mailing list
> >     torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
> >     http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
> >
> >
> > --
> > David Beer | Software Engineer
> > Adaptive Computing
> >
> >
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
David Beer | Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120404/63198595/attachment.html 


More information about the torqueusers mailing list