[torqueusers] TORQUE 4.0 and hwloc
Gus Correa
gus at ldeo.columbia.edu
Wed Apr 4 09:50:43 MDT 2012
Hi David
Not to hijack Steven's thread ...
... but just taking a quick ride on it ... :)
Does the hwloc 1.1 requirement apply only to Torque 4.0?
How about the older Torque series [2.X.Y, 3.X.Y]
that use cpuset?
[I am in the process of building 2.4.16 with cpuset.]
Thank you,
Gus Correa
On 04/04/2012 10:59 AM, David Beer wrote:
> Steven,
>
> I was supposed to add that note and I forgot - my mistake and thanks for
> catching it. I have now added:
>
> *** For admins that use cpusets in any form ***
> hwloc version 1.1 or greater is now required for building TORQUE with
> cpusets, as pbs_mom now uses the
> hwloc API to create the cpusets instead of creating them manually.
>
> to README.building_40.
>
> As far as checking for the existence of the library, this does happen at
> configure time once the configure script determines that the user is
> going to be using cpusets in any way, which a few different configure
> options can trigger.
>
> David
>
> On Tue, Apr 3, 2012 at 8:15 PM, DuChene, StevenX A
> <stevenx.a.duchene at intel.com <mailto:stevenx.a.duchene at intel.com>> wrote:
>
> I installed hwloc-1.4.1 and hwloc-devel-1.4.1 rpms on the server
> where I am building torque-4.X and in looking through the output
> from the configure script during the build I do not see anywhere
> that the existence of any hwloc stuff is checked. In fact in
> grepping through the output from the whole torque rpm build process
> I do not see ANY mention of hwloc at all.____
>
> __ __
>
> I see compile time flags of HWLOC_CFLAGS and HWLOC_LIBS mentioned in
> the –help output from configure but according to the description
> text this is just supposed to over-ride the pkg-config results
> however I do not see any evidence that the pkg-config system is
> being quizzed at all for the existence of hwloc on the build server.____
>
> __ __
>
> Is there some step I am missing?____
>
> __ __
>
> I thought someone mentioned that there would be better documentation
> of the hwloc business in the torque-4.0.1 release?____
>
> __ __
>
> If so where is it?____
>
> --____
>
> Steven DuChene____
>
> __ __
>
> *From:*torqueusers-bounces at supercluster.org
> <mailto:torqueusers-bounces at supercluster.org>
> [mailto:torqueusers-bounces at supercluster.org
> <mailto:torqueusers-bounces at supercluster.org>] *On Behalf Of *David Beer
> *Sent:* Monday, March 19, 2012 8:54 AM
> *To:* Torque Users Mailing List
> *Subject:* Re: [torqueusers] TORQUE 4.0 Officially Announced____
>
> __ __
>
> Steve,____
>
> __ __
>
> Hwloc is now required for running cpusets in TORQUE, and it helps
> out a lot both in immediate use and in groundwork for future
> features.____
>
> __ __
>
> Immediately hwloc gives you a better cpuset because it gives you the
> next core instead of the next indexed core. For example: many eight
> core systems have processors 0, 2, 4, and 6 next to each other and
> processors 1, 3, 5, and 7 next to each other. If you're running a
> pre-4.0 TORQUE, and you have two jobs on the node, each with 4
> cores, job 1 will have 0-3 and job 2 will have 4-7. In TORQUE 4.0,
> job 1 will have 0, 2, 4, and 6, and job 2 will have 1, 3, 5, and 7.
> This should help speed up processing times for jobs (NOTE: only if
> you have this kind of system and a comparable job layout, I'm not
> promising a general speed-up to everyone using cpusets). This should
> also allow us to properly handle hyperthreading for anyone that has
> it turned on and wishes to use it.____
>
> __ __
>
> The last immediate feature is if you have SMT (simultaneous
> multi-threading) hardware. The mom config variable $use_smt was
> added. By default, the use of SMT is enabled, but you can tell your
> pbs_mom to ignore them (not place them in the cpuset) using by
> adding____
>
> __ __
>
> $use_smt false____
>
> __ __
>
> to your mom config file____
>
> __ __
>
> For the future, the hwloc threads make it really easy for us to
> handle hardware specific requests. One of the coming features for
> TORQUE is to allow requests roughly similar to:____
>
> __ __
>
> socket=2:numa=2 --with-hyperthreads____
>
> __ __
>
> which would say to spread the job over 2 sockets, and across the 2
> numa nodes on each socket. This is a feature we plan to add to
> improve support for Magny-Cours and Opteron type processors that
> have multiple sockets and or multiple numa nodes on the processor
> chip. Using hwloc makes it so we don't have to parse system files
> and map the indices to the sockets and/or numa nodes ourselves, we
> can simply use easy hwloc functions
> like hwloc_get_next_obj_inside_cpuset_by_type() that allow you to
> just move on to the next physical core or virtual core, or skip to
> the next socket or numa node as the case may be.____
>
> __ __
>
> David____
>
> On Mon, Mar 19, 2012 at 8:47 AM, DuChene, StevenX A
> <stevenx.a.duchene at intel.com <mailto:stevenx.a.duchene at intel.com>>
> wrote:____
>
> Also a better (more complete) explanation of what features are
> enabled when hwloc is used would be helpful as well.
>
> BTW, I built torque on my server without hwloc installed and then
> installed the resulting mom packages on my nodes. The mom daemons in
> that case did seem to start up just fine.
> --
> Steven DuChene____
>
>
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org
> <mailto:torqueusers-bounces at supercluster.org>
> [mailto:torqueusers-bounces at supercluster.org
> <mailto:torqueusers-bounces at supercluster.org>] On Behalf Of Craig West
> Sent: Sunday, March 18, 2012 10:40 PM
> To: Torque Users mailing list; Torque Developers mailing list____
>
> Subject: Re: [torqueusers] TORQUE 4.0 Officially Announced
>
>
> Hi Steven,
>
> I have just begun testing Torque 4.0, as hwloc has been a long awaited
> feature for me.
>
> > It is unclear from this announcement text where hwloc has to be
> installed.
> > Is it just on the server or on the nodes only?
>
> It needs to be available on the BUILD server and the nodes. I tried to
> run pbs_mom on a node without the hwloc I had installed and it failed.
>
> Note: I am running hwloc 1.4 from a directory in /usr/local
> This was not automatically found by the TORQUE configure script, but you
> can specify the location using HWLOC_CFLAGS & HWLOC_LIBS.
> It embeds the locations that you specify in the pbs_mom (and other
> files) but it seems you can set the LD_LIBRARY_PATH variable if it is
> not in the same location on the BUILD server as the compute nodes.
> For simplicity installing them in the same location makes sense.
>
> > More documentation about this would be greatly appreciated.
>
> I agree, clearer and more detailed documentation would be useful.
>
> Cheers,
> Craig.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
> http://www.supercluster.org/mailman/listinfo/torqueusers____
>
>
>
> ____
>
> __ __
>
> -- ____
>
> David Beer | Software Engineer____
>
> Adaptive Computing____
>
> __ __
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
>
> --
> David Beer | Software Engineer
> Adaptive Computing
>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list