[torqueusers] TORQUE 4.0 and hwloc

DuChene, StevenX A stevenx.a.duchene at intel.com
Wed Apr 4 11:02:19 MDT 2012


Hmmm, ok so there are certain configure options that have an effect on whether the configure script looks for hwloc .

Do those include all or only some of the following?

--enable-geometry-requests
--enable-cpuset
--enable-libcpuset
--enable-numa-support

I am trying to see if this gets correctly enabled when I build the rpms but in looking through the torque.spec file it is a little confusing. I see the following in the spec file:

# bcond_without defaults to WITH, and vice versa.

But then I see a little further:

### Features disabled by default
%bcond_with    blcr
%bcond_with    cpuset

And on the line that actually calls configure from within the spec file I see:

%configure --includedir=%{_includedir}/%{name} --with-default-server=%{torque_server} \
    --with-server-home=%{torque_home} --with-sendmail=%{sendmail_path} \
    --disable-dependency-tracking %{ac_with_gui} %{ac_with_scp} %{ac_with_syslog} \
    --disable-gcc-warnings %{ac_with_munge} %{ac_with_pam} %{ac_with_drmaa} \
    --disable-qsub-keep-override %{ac_with_blcr} %{ac_with_cpuset} %{ac_with_spool} %{?acflags}

So is "%bcond_with    cpuset" supposed to turn it off or on? If it is supposed to turn it on then as I said before it is not working.

Now I know I can just alter the spec file to hard code turn it on with "-enable-cpuset" or "-enable-libcpuset" or possibly "--enable-geometry-requests" but I am trying to understand the logic of what I see someone cleverly added into the torque spec file as distributed with the torque-4.0 sources.
--
Steven DuChene

From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of David Beer
Sent: Wednesday, April 04, 2012 8:00 AM
To: Torque Users Mailing List
Cc: Torque Developers mailing list
Subject: Re: [torqueusers] TORQUE 4.0 and hwloc

Steven,

I was supposed to add that note and I forgot - my mistake and thanks for catching it. I have now added:

*** For admins that use cpusets in any form ***
hwloc version 1.1 or greater is now required for building TORQUE with cpusets, as pbs_mom now uses the
hwloc API to create the cpusets instead of creating them manually.

to README.building_40.

As far as checking for the existence of the library, this does happen at configure time once the configure script determines that the user is going to be using cpusets in any way, which a few different configure options can trigger.

David
On Tue, Apr 3, 2012 at 8:15 PM, DuChene, StevenX A <stevenx.a.duchene at intel.com<mailto:stevenx.a.duchene at intel.com>> wrote:
I installed hwloc-1.4.1 and hwloc-devel-1.4.1 rpms on the server where I am building torque-4.X and in looking through the output from the configure script during the build I do not see anywhere that the existence of any hwloc stuff is checked. In fact in grepping through the output from the whole torque rpm build process I do not see ANY mention of hwloc at all.

I see compile time flags of HWLOC_CFLAGS and HWLOC_LIBS mentioned in the -help output from configure but according to the description text this is just supposed to over-ride the pkg-config results however I do not see any evidence that the pkg-config system is being quizzed at all for the existence of hwloc on the build server.

Is there some step I am missing?

I thought someone mentioned that there would be better documentation of the hwloc business in the torque-4.0.1 release?

If so where is it?
--
Steven DuChene

From: torqueusers-bounces at supercluster.org<mailto:torqueusers-bounces at supercluster.org> [mailto:torqueusers-bounces at supercluster.org<mailto:torqueusers-bounces at supercluster.org>] On Behalf Of David Beer
Sent: Monday, March 19, 2012 8:54 AM
To: Torque Users Mailing List
Subject: Re: [torqueusers] TORQUE 4.0 Officially Announced

Steve,

Hwloc is now required for running cpusets in TORQUE, and it helps out a lot both in immediate use and in groundwork for future features.

Immediately hwloc gives you a better cpuset because it gives you the next core instead of the next indexed core. For example: many eight core systems have processors 0, 2, 4, and 6 next to each other and processors 1, 3,  5, and 7 next to each other. If you're running a pre-4.0 TORQUE, and you have two jobs on the node, each with 4 cores, job 1 will have 0-3 and job 2 will have 4-7. In TORQUE 4.0, job 1 will have 0, 2, 4, and 6, and job 2 will have 1, 3, 5, and 7. This should help speed up processing times for jobs (NOTE: only if you have this kind of system and a comparable job layout, I'm not promising a general speed-up to everyone using cpusets). This should also allow us to properly handle hyperthreading for anyone that has it turned on and wishes to use it.

The last immediate feature is if you have SMT (simultaneous multi-threading) hardware. The mom config variable $use_smt was added. By default, the use of SMT is enabled, but you can tell your pbs_mom to ignore them (not place them in the cpuset) using by adding

$use_smt false

to your mom config file

For the future, the hwloc threads make it really easy for us to handle hardware specific requests. One of the coming features for TORQUE is to allow requests roughly similar to:

socket=2:numa=2 --with-hyperthreads

which would say to spread the job over 2 sockets, and across the 2 numa nodes on each socket. This is a feature we plan to add to improve support for Magny-Cours and Opteron type processors that have multiple sockets and or multiple numa nodes on the processor chip. Using hwloc makes it so we don't have to parse system files and map the indices to the sockets and/or numa nodes ourselves, we can simply use easy hwloc functions like hwloc_get_next_obj_inside_cpuset_by_type() that allow you to just move on to the next physical core or virtual core, or skip to the next socket or numa node as the case may be.

David
On Mon, Mar 19, 2012 at 8:47 AM, DuChene, StevenX A <stevenx.a.duchene at intel.com<mailto:stevenx.a.duchene at intel.com>> wrote:
Also a better (more complete) explanation of what features are enabled when hwloc is used would be helpful as well.

BTW, I built torque on my server without hwloc installed and then installed the resulting mom packages on my nodes. The mom daemons in that case did seem to start up just fine.
--
Steven DuChene

-----Original Message-----
From: torqueusers-bounces at supercluster.org<mailto:torqueusers-bounces at supercluster.org> [mailto:torqueusers-bounces at supercluster.org<mailto:torqueusers-bounces at supercluster.org>] On Behalf Of Craig West
Sent: Sunday, March 18, 2012 10:40 PM
To: Torque Users mailing list; Torque Developers mailing list
Subject: Re: [torqueusers] TORQUE 4.0 Officially Announced


Hi Steven,

I have just begun testing Torque 4.0, as hwloc has been a long awaited
feature for me.

> It is unclear from this announcement text where hwloc has to be installed.
> Is it just on the server or on the nodes only?

It needs to be available on the BUILD server and the nodes. I tried to
run pbs_mom on a node without the hwloc I had installed and it failed.

Note: I am running hwloc 1.4 from a directory in /usr/local
This was not automatically found by the TORQUE configure script, but you
can specify the location using HWLOC_CFLAGS & HWLOC_LIBS.
It embeds the locations that you specify in the pbs_mom (and other
files) but it seems you can set the LD_LIBRARY_PATH variable if it is
not in the same location on the BUILD server as the compute nodes.
For simplicity installing them in the same location makes sense.

> More documentation about this would be greatly appreciated.

I agree, clearer and more detailed documentation would be useful.

Cheers,
Craig.
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org<mailto:torqueusers at supercluster.org>
http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org<mailto:torqueusers at supercluster.org>
http://www.supercluster.org/mailman/listinfo/torqueusers



--
David Beer | Software Engineer
Adaptive Computing


_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org<mailto:torqueusers at supercluster.org>
http://www.supercluster.org/mailman/listinfo/torqueusers



--
David Beer | Software Engineer
Adaptive Computing

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120404/84506d4b/attachment-0001.html 


More information about the torqueusers mailing list