[torqueusers] TORQUE 4.0 and hwloc

David Beer dbeer at adaptivecomputing.com
Wed Apr 4 11:11:20 MDT 2012


On Wed, Apr 4, 2012 at 11:02 AM, DuChene, StevenX A <
stevenx.a.duchene at intel.com> wrote:

>  Hmmm, ok so there are certain configure options that have an effect on
> whether the configure script looks for hwloc .****
>
> ** **
>
> Do those include all or only some of the following?****
>
> ** **
>
> --enable-geometry-requests****
>
> --enable-cpuset****
>
> --enable-libcpuset****
>
> --enable-numa-support****
>
> **
>

It includes all of those.


> **
>
> I am trying to see if this gets correctly enabled when I build the rpms
> but in looking through the torque.spec file it is a little confusing. I see
> the following in the spec file:****
>
> ** **
>
> # bcond_without defaults to WITH, and vice versa.****
>
> ** **
>
> But then I see a little further:****
>
> ** **
>
> ### Features disabled by default****
>
> %bcond_with    blcr****
>
> %bcond_with    cpuset****
>
> ** **
>
> And on the line that actually calls configure from within the spec file I
> see:****
>
> ** **
>
> %configure --includedir=%{_includedir}/%{name}
> --with-default-server=%{torque_server} \****
>
>     --with-server-home=%{torque_home} --with-sendmail=%{sendmail_path} \**
> **
>
>     --disable-dependency-tracking %{ac_with_gui} %{ac_with_scp}
> %{ac_with_syslog} \****
>
>     --disable-gcc-warnings %{ac_with_munge} %{ac_with_pam}
> %{ac_with_drmaa} \****
>
>     --disable-qsub-keep-override %{ac_with_blcr} %{ac_with_cpuset}
> %{ac_with_spool} %{?acflags}****
>
> ** **
>
> So is “%bcond_with    cpuset” supposed to turn it off or on? If it is
> supposed to turn it on then as I said before it is not working.****
>
> ** **
>
> Now I know I can just alter the spec file to hard code turn it on with
> “—enable-cpuset” or “—enable-libcpuset” or possibly
> “--enable-geometry-requests” but I am trying to understand the logic of
> what I see someone cleverly added into the torque spec file as distributed
> with the torque-4.0 sources.****
>
> --****
>
> Steven DuChene****
>
> ** **
>
> *From:* torqueusers-bounces at supercluster.org [mailto:
> torqueusers-bounces at supercluster.org] *On Behalf Of *David Beer
> *Sent:* Wednesday, April 04, 2012 8:00 AM
>
> *To:* Torque Users Mailing List
> *Cc:* Torque Developers mailing list
> *Subject:* Re: [torqueusers] TORQUE 4.0 and hwloc****
>
>  ** **
>
> Steven,
>
> I was supposed to add that note and I forgot - my mistake and thanks for
> catching it. I have now added:
>
> *** For admins that use cpusets in any form ***
> hwloc version 1.1 or greater is now required for building TORQUE with
> cpusets, as pbs_mom now uses the
> hwloc API to create the cpusets instead of creating them manually.
>
> to README.building_40.
>
> As far as checking for the existence of the library, this does happen at
> configure time once the configure script determines that the user is going
> to be using cpusets in any way, which a few different configure options can
> trigger.
>
> David****
>
> On Tue, Apr 3, 2012 at 8:15 PM, DuChene, StevenX A <
> stevenx.a.duchene at intel.com> wrote:****
>
> I installed hwloc-1.4.1 and hwloc-devel-1.4.1 rpms on the server where I
> am building torque-4.X and in looking through the output from the configure
> script during the build I do not see anywhere that the existence of any
> hwloc stuff is checked. In fact in grepping through the output from the
> whole torque rpm build process I do not see ANY mention of hwloc at all.**
> **
>
>  ****
>
> I see compile time flags of HWLOC_CFLAGS and HWLOC_LIBS mentioned in the
> –help output from configure but according to the description text this is
> just supposed to over-ride the pkg-config results however I do not see any
> evidence that the pkg-config system is being quizzed at all for the
> existence of hwloc on the build server.****
>
>  ****
>
> Is there some step I am missing?****
>
>  ****
>
> I thought someone mentioned that there would be better documentation of
> the hwloc business in the torque-4.0.1 release?****
>
>  ****
>
> If so where is it?****
>
> --****
>
> Steven DuChene****
>
>  ****
>
> *From:* torqueusers-bounces at supercluster.org [mailto:
> torqueusers-bounces at supercluster.org] *On Behalf Of *David Beer
> *Sent:* Monday, March 19, 2012 8:54 AM
> *To:* Torque Users Mailing List
> *Subject:* Re: [torqueusers] TORQUE 4.0 Officially Announced****
>
>  ****
>
> Steve,****
>
>  ****
>
> Hwloc is now required for running cpusets in TORQUE, and it helps out a
> lot both in immediate use and in groundwork for future features.****
>
>  ****
>
> Immediately hwloc gives you a better cpuset because it gives you the next
> core instead of the next indexed core. For example: many eight core systems
> have processors 0, 2, 4, and 6 next to each other and processors 1, 3,  5,
> and 7 next to each other. If you're running a pre-4.0 TORQUE, and you have
> two jobs on the node, each with 4 cores, job 1 will have 0-3 and job 2 will
> have 4-7. In TORQUE 4.0, job 1 will have 0, 2, 4, and 6, and job 2 will
> have 1, 3, 5, and 7. This should help speed up processing times for jobs
> (NOTE: only if you have this kind of system and a comparable job layout,
> I'm not promising a general speed-up to everyone using cpusets). This
> should also allow us to properly handle hyperthreading for anyone that has
> it turned on and wishes to use it.****
>
>  ****
>
> The last immediate feature is if you have SMT (simultaneous
> multi-threading) hardware. The mom config variable $use_smt was added. By
> default, the use of SMT is enabled, but you can tell your pbs_mom to ignore
> them (not place them in the cpuset) using by adding****
>
>  ****
>
> $use_smt false****
>
>  ****
>
> to your mom config file****
>
>  ****
>
> For the future, the hwloc threads make it really easy for us to handle
> hardware specific requests. One of the coming features for TORQUE is to
> allow requests roughly similar to:****
>
>  ****
>
> socket=2:numa=2 --with-hyperthreads****
>
>  ****
>
> which would say to spread the job over 2 sockets, and across the 2 numa
> nodes on each socket. This is a feature we plan to add to improve support
> for Magny-Cours and Opteron type processors that have multiple sockets and
> or multiple numa nodes on the processor chip. Using hwloc makes it so we
> don't have to parse system files and map the indices to the sockets and/or
> numa nodes ourselves, we can simply use easy hwloc functions
> like hwloc_get_next_obj_inside_cpuset_by_type() that allow you to just move
> on to the next physical core or virtual core, or skip to the next socket or
> numa node as the case may be.****
>
>  ****
>
> David****
>
> On Mon, Mar 19, 2012 at 8:47 AM, DuChene, StevenX A <
> stevenx.a.duchene at intel.com> wrote:****
>
> Also a better (more complete) explanation of what features are enabled
> when hwloc is used would be helpful as well.
>
> BTW, I built torque on my server without hwloc installed and then
> installed the resulting mom packages on my nodes. The mom daemons in that
> case did seem to start up just fine.
> --
> Steven DuChene****
>
>
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org [mailto:
> torqueusers-bounces at supercluster.org] On Behalf Of Craig West
> Sent: Sunday, March 18, 2012 10:40 PM
> To: Torque Users mailing list; Torque Developers mailing list****
>
> Subject: Re: [torqueusers] TORQUE 4.0 Officially Announced
>
>
> Hi Steven,
>
> I have just begun testing Torque 4.0, as hwloc has been a long awaited
> feature for me.
>
> > It is unclear from this announcement text where hwloc has to be
> installed.
> > Is it just on the server or on the nodes only?
>
> It needs to be available on the BUILD server and the nodes. I tried to
> run pbs_mom on a node without the hwloc I had installed and it failed.
>
> Note: I am running hwloc 1.4 from a directory in /usr/local
> This was not automatically found by the TORQUE configure script, but you
> can specify the location using HWLOC_CFLAGS & HWLOC_LIBS.
> It embeds the locations that you specify in the pbs_mom (and other
> files) but it seems you can set the LD_LIBRARY_PATH variable if it is
> not in the same location on the BUILD server as the compute nodes.
> For simplicity installing them in the same location makes sense.
>
> > More documentation about this would be greatly appreciated.
>
> I agree, clearer and more detailed documentation would be useful.
>
> Cheers,
> Craig.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers****
>
>
>
> ****
>
>  ****
>
> -- ****
>
> David Beer | Software Engineer****
>
> Adaptive Computing****
>
>  ****
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers****
>
>
>
>
> -- ****
>
> David Beer | Software Engineer****
>
> Adaptive Computing****
>
> ** **
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>


-- 
David Beer | Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120404/67d8cb28/attachment-0001.html 


More information about the torqueusers mailing list