[torqueusers] Solution: torque-1.2.0p6 build problem using gcc 3.4 (RHEL4 or FC4)

Maestas, Christopher Daniel cdmaest at sandia.gov
Mon Sep 19 22:55:34 MDT 2005


I'm curious if there is anyone out there maintaining a standard type rpm
for torque.
I haven't seen much in the way of 1.2.0pX ... I was wondering if we
could get a contrib type spec file or better yet an actual working spec
file to be able to run "rpmbuild -tb torque-1.2.0pX.tar.gz"  I thought
I'd ask this, since this fix seems to refer to rpm building. :-)


-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of Ole Holm
Nielsen
Sent: Monday, September 19, 2005 9:09 AM
To: torqueusers at supercluster.org
Subject: [torqueusers] Solution: torque-1.2.0p6 build problem using gcc
3.4 (RHEL4 or FC4)

Dear Torque users,

We have previously discussed a problem starting LAM-MPI parallel jobs
with torque-1.2.0p6 in this thread:
http://www.supercluster.org/pipermail/torqueusers/2005-September/002079.
html

If you use Torque on Redhat Enterprise Linux 4, Fedora Core 4 or any
other system using gcc 3.4 (or later), you should know about a problem
caused by a new feature in gcc 3.4, as well as the solution to this
problem:

We found that the Torque build process has a problem with gcc 3.4.3,
namely that a "make install" will cause a second, superfluous
recompilation of everything.  If you're building an RPM, this causes
subtle problems in the resulting RPMs because some hardcoded paths may
be incorrect.  This was the problem that made LAM-MPI booting fail
because pbs_mom could not find the pbs_demux executable (see the above
thread).

The quick summary:
------------------

1. With Torque up to and including 1.2.0p6, a workaround is to
    configure Torque with an additional CFLAGS option
    -fno-working-directory, if your system uses gcc 3.4 or newer.
2. Torque 1.2.0p7 (current snapshot and later) has a patch in
    buildutils/makedepend-sh which is the permanent solution,
    so the -fno-working-directory workaround is not needed here.

Additional details:
-------------------

The gcc 3.4 man-page describes a new feature:
       -fworking-directory
            Enable generation of linemarkers in the preprocessor output
that
            will let the compiler know the current working directory at
the
            time of preprocessing.  When this option is enabled, the
prepro-
            cessor will emit, after the initial linemarker, a second
line-
            marker with the current working directory followed by two
slashes.
            ...

This new default feature causes Torque's buildutils/makedepend-sh script
to add a dependency of all .o files upon the timestamp of the current
working directory in the Makefile, in case you use the -g flag in CFLAGS
(the default).  Look for the following pattern in the Makefile:

# DO NOT DELETE THIS LINE -- makedepend-sh depends on it
accounting.o: ./accounting.c
accounting.o: /scratch/Torque/torque-1.2.0p6/src/server//

The line terminated with "//" refers to the current working directory.
This dependency causes all .o files to be rebuilt every time you do a
"make" in any directory, including the case where you do a "make
install".

In the case of RPM building, this is a real problem because all files
will be installed into a temporary location.  The pbs_mom will now have
an incorrect hardcoded path to pbs_demux and pbs_rcp, for example,
/var/tmp/torque-1.2.0p6-buildroot/usr/sbin/pbs_demux
(check this by "strings /usr/sbin/pbs_mom | grep pbs_demux").

In this scenario all parallel jobs using the "tm" boot interface will
fail because the pbs_demux process failed to be started by pbs_mom.  A
simple test to perform is to run "pbsdsh hostname"
within a multi-node PBS batch job.  If pbsdsh gives error messages, you
may have the above problem, and other environments such as LAM-MPI using
the "tm" interface are going to fail as well.

If you want to patch your current Torque installation, here's the diff
(now in the CVS for 1.2.0p7) as provided by Garrick:

--- buildutils/makedepend-sh_orig       2005-09-18 10:04:34.000000000
-0700
+++ buildutils/makedepend-sh    2005-09-18 10:04:05.000000000 -0700
@@ -575,6 +575,7 @@

                  eval $CPP $arg_cc $d/$s $errout | \
                    sed -n -e "s;^\# [0-9][0-9 ]*\"\(.*\)\";$f: \1;p" |
\
+                  grep -v "$PWD//\$" | \
                    grep -v "$s\$" | grep -v command | grep -v built-in
| \
                    sed -e 's;\([^ :]*: [^ ]*\).*;\1;' \
                    >> $TMP

Many thanks go to Garrick Staples (USC) for much ping-pong debugging and
for coming up with the patch as well as the -fno-working-directory
workaround.

--
Ole Holm Nielsen
Department of Physics, Technical University of Denmark
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers




More information about the torqueusers mailing list