[torqueusers] Solution: torque-1.2.0p6 build problem using gcc
3.4 (RHEL4 or FC4)
Steve Traylen
s.traylen at rl.ac.uk
Tue Sep 20 02:44:45 MDT 2005
On Mon, Sep 19, 2005 at 10:55:34PM -0600 or thereabouts, Maestas, Christopher Daniel wrote:
> I'm curious if there is anyone out there maintaining a standard type rpm
> for torque.
> I haven't seen much in the way of 1.2.0pX ... I was wondering if we
> could get a contrib type spec file or better yet an actual working spec
> file to be able to run "rpmbuild -tb torque-1.2.0pX.tar.gz" I thought
> I'd ask this, since this fix seems to refer to rpm building. :-)
There are some here.
http://quattor.web.lal.in2p3.fr/packages/mpi/
these are built for ScientificLinux 3.
Steve
>
>
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org
> [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Ole Holm
> Nielsen
> Sent: Monday, September 19, 2005 9:09 AM
> To: torqueusers at supercluster.org
> Subject: [torqueusers] Solution: torque-1.2.0p6 build problem using gcc
> 3.4 (RHEL4 or FC4)
>
> Dear Torque users,
>
> We have previously discussed a problem starting LAM-MPI parallel jobs
> with torque-1.2.0p6 in this thread:
> http://www.supercluster.org/pipermail/torqueusers/2005-September/002079.
> html
>
> If you use Torque on Redhat Enterprise Linux 4, Fedora Core 4 or any
> other system using gcc 3.4 (or later), you should know about a problem
> caused by a new feature in gcc 3.4, as well as the solution to this
> problem:
>
> We found that the Torque build process has a problem with gcc 3.4.3,
> namely that a "make install" will cause a second, superfluous
> recompilation of everything. If you're building an RPM, this causes
> subtle problems in the resulting RPMs because some hardcoded paths may
> be incorrect. This was the problem that made LAM-MPI booting fail
> because pbs_mom could not find the pbs_demux executable (see the above
> thread).
>
> The quick summary:
> ------------------
>
> 1. With Torque up to and including 1.2.0p6, a workaround is to
> configure Torque with an additional CFLAGS option
> -fno-working-directory, if your system uses gcc 3.4 or newer.
> 2. Torque 1.2.0p7 (current snapshot and later) has a patch in
> buildutils/makedepend-sh which is the permanent solution,
> so the -fno-working-directory workaround is not needed here.
>
> Additional details:
> -------------------
>
> The gcc 3.4 man-page describes a new feature:
> -fworking-directory
> Enable generation of linemarkers in the preprocessor output
> that
> will let the compiler know the current working directory at
> the
> time of preprocessing. When this option is enabled, the
> prepro-
> cessor will emit, after the initial linemarker, a second
> line-
> marker with the current working directory followed by two
> slashes.
> ...
>
> This new default feature causes Torque's buildutils/makedepend-sh script
> to add a dependency of all .o files upon the timestamp of the current
> working directory in the Makefile, in case you use the -g flag in CFLAGS
> (the default). Look for the following pattern in the Makefile:
>
> # DO NOT DELETE THIS LINE -- makedepend-sh depends on it
> accounting.o: ./accounting.c
> accounting.o: /scratch/Torque/torque-1.2.0p6/src/server//
>
> The line terminated with "//" refers to the current working directory.
> This dependency causes all .o files to be rebuilt every time you do a
> "make" in any directory, including the case where you do a "make
> install".
>
> In the case of RPM building, this is a real problem because all files
> will be installed into a temporary location. The pbs_mom will now have
> an incorrect hardcoded path to pbs_demux and pbs_rcp, for example,
> /var/tmp/torque-1.2.0p6-buildroot/usr/sbin/pbs_demux
> (check this by "strings /usr/sbin/pbs_mom | grep pbs_demux").
>
> In this scenario all parallel jobs using the "tm" boot interface will
> fail because the pbs_demux process failed to be started by pbs_mom. A
> simple test to perform is to run "pbsdsh hostname"
> within a multi-node PBS batch job. If pbsdsh gives error messages, you
> may have the above problem, and other environments such as LAM-MPI using
> the "tm" interface are going to fail as well.
>
> If you want to patch your current Torque installation, here's the diff
> (now in the CVS for 1.2.0p7) as provided by Garrick:
>
> --- buildutils/makedepend-sh_orig 2005-09-18 10:04:34.000000000
> -0700
> +++ buildutils/makedepend-sh 2005-09-18 10:04:05.000000000 -0700
> @@ -575,6 +575,7 @@
>
> eval $CPP $arg_cc $d/$s $errout | \
> sed -n -e "s;^\# [0-9][0-9 ]*\"\(.*\)\";$f: \1;p" |
> \
> + grep -v "$PWD//\$" | \
> grep -v "$s\$" | grep -v command | grep -v built-in
> | \
> sed -e 's;\([^ :]*: [^ ]*\).*;\1;' \
> >> $TMP
>
> Many thanks go to Garrick Staples (USC) for much ping-pong debugging and
> for coming up with the patch as well as the -fno-working-directory
> workaround.
>
> --
> Ole Holm Nielsen
> Department of Physics, Technical University of Denmark
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
--
Steve Traylen
s.traylen at rl.ac.uk
http://www.gridpp.ac.uk/
More information about the torqueusers
mailing list