[torqueusers] Solution: torque-1.2.0p6 build problem using gcc 3.4 (RHEL4 or FC4)

Steve Traylen s.traylen at rl.ac.uk
Tue Sep 20 02:44:45 MDT 2005


On Mon, Sep 19, 2005 at 10:55:34PM -0600 or thereabouts, Maestas, Christopher Daniel wrote:
> I'm curious if there is anyone out there maintaining a standard type rpm
> for torque.
> I haven't seen much in the way of 1.2.0pX ... I was wondering if we
> could get a contrib type spec file or better yet an actual working spec
> file to be able to run "rpmbuild -tb torque-1.2.0pX.tar.gz"  I thought
> I'd ask this, since this fix seems to refer to rpm building. :-)

There are some here.

http://quattor.web.lal.in2p3.fr/packages/mpi/

these are built for ScientificLinux 3.

Steve
> 
> 
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org
> [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Ole Holm
> Nielsen
> Sent: Monday, September 19, 2005 9:09 AM
> To: torqueusers at supercluster.org
> Subject: [torqueusers] Solution: torque-1.2.0p6 build problem using gcc
> 3.4 (RHEL4 or FC4)
> 
> Dear Torque users,
> 
> We have previously discussed a problem starting LAM-MPI parallel jobs
> with torque-1.2.0p6 in this thread:
> http://www.supercluster.org/pipermail/torqueusers/2005-September/002079.
> html
> 
> If you use Torque on Redhat Enterprise Linux 4, Fedora Core 4 or any
> other system using gcc 3.4 (or later), you should know about a problem
> caused by a new feature in gcc 3.4, as well as the solution to this
> problem:
> 
> We found that the Torque build process has a problem with gcc 3.4.3,
> namely that a "make install" will cause a second, superfluous
> recompilation of everything.  If you're building an RPM, this causes
> subtle problems in the resulting RPMs because some hardcoded paths may
> be incorrect.  This was the problem that made LAM-MPI booting fail
> because pbs_mom could not find the pbs_demux executable (see the above
> thread).
> 
> The quick summary:
> ------------------
> 
> 1. With Torque up to and including 1.2.0p6, a workaround is to
>     configure Torque with an additional CFLAGS option
>     -fno-working-directory, if your system uses gcc 3.4 or newer.
> 2. Torque 1.2.0p7 (current snapshot and later) has a patch in
>     buildutils/makedepend-sh which is the permanent solution,
>     so the -fno-working-directory workaround is not needed here.
> 
> Additional details:
> -------------------
> 
> The gcc 3.4 man-page describes a new feature:
>        -fworking-directory
>             Enable generation of linemarkers in the preprocessor output
> that
>             will let the compiler know the current working directory at
> the
>             time of preprocessing.  When this option is enabled, the
> prepro-
>             cessor will emit, after the initial linemarker, a second
> line-
>             marker with the current working directory followed by two
> slashes.
>             ...
> 
> This new default feature causes Torque's buildutils/makedepend-sh script
> to add a dependency of all .o files upon the timestamp of the current
> working directory in the Makefile, in case you use the -g flag in CFLAGS
> (the default).  Look for the following pattern in the Makefile:
> 
> # DO NOT DELETE THIS LINE -- makedepend-sh depends on it
> accounting.o: ./accounting.c
> accounting.o: /scratch/Torque/torque-1.2.0p6/src/server//
> 
> The line terminated with "//" refers to the current working directory.
> This dependency causes all .o files to be rebuilt every time you do a
> "make" in any directory, including the case where you do a "make
> install".
> 
> In the case of RPM building, this is a real problem because all files
> will be installed into a temporary location.  The pbs_mom will now have
> an incorrect hardcoded path to pbs_demux and pbs_rcp, for example,
> /var/tmp/torque-1.2.0p6-buildroot/usr/sbin/pbs_demux
> (check this by "strings /usr/sbin/pbs_mom | grep pbs_demux").
> 
> In this scenario all parallel jobs using the "tm" boot interface will
> fail because the pbs_demux process failed to be started by pbs_mom.  A
> simple test to perform is to run "pbsdsh hostname"
> within a multi-node PBS batch job.  If pbsdsh gives error messages, you
> may have the above problem, and other environments such as LAM-MPI using
> the "tm" interface are going to fail as well.
> 
> If you want to patch your current Torque installation, here's the diff
> (now in the CVS for 1.2.0p7) as provided by Garrick:
> 
> --- buildutils/makedepend-sh_orig       2005-09-18 10:04:34.000000000
> -0700
> +++ buildutils/makedepend-sh    2005-09-18 10:04:05.000000000 -0700
> @@ -575,6 +575,7 @@
> 
>                   eval $CPP $arg_cc $d/$s $errout | \
>                     sed -n -e "s;^\# [0-9][0-9 ]*\"\(.*\)\";$f: \1;p" |
> \
> +                  grep -v "$PWD//\$" | \
>                     grep -v "$s\$" | grep -v command | grep -v built-in
> | \
>                     sed -e 's;\([^ :]*: [^ ]*\).*;\1;' \
>                     >> $TMP
> 
> Many thanks go to Garrick Staples (USC) for much ping-pong debugging and
> for coming up with the patch as well as the -fno-working-directory
> workaround.
> 
> --
> Ole Holm Nielsen
> Department of Physics, Technical University of Denmark
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-- 
Steve Traylen
s.traylen at rl.ac.uk
http://www.gridpp.ac.uk/


More information about the torqueusers mailing list