[torqueusers] Re: LAM-MPI won't boot with torque-1.2.0p6

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Fri Sep 16 03:10:08 MDT 2005


garrick wrote:
>>Speaking of a pbs_demux process, when would that be started ?
>>It's not running on the nodes after I start an interactive PBS job.
> 
> It is supposed to be started at the launch of all multi-node jobs.

OK, something to check for.  Now, I have a funny observation about the
pbs_mom which I've built as an RPM using a torque.spec file adapted
from your version.  On a compute node I look inside pbs_mom:

# strings /usr/sbin/pbs_mom | grep /usr/sbin
/var/tmp/torque-1.2.0p6-buildroot/usr/sbin/pbs_rcp
/var/tmp/torque-1.2.0p6-buildroot/usr/sbin/pbs_demux

Isn't that weird !  The path to pbs_demux is actually related to
the one which used to exist during the RPM build process !

I've softlinked /var/tmp/torque-1.2.0p6-buildroot/usr/sbin/pbs_demux
to point to /usr/sbin/pbs_demux.  Then I've retried the pbsdsh
test command under a PBS job, and now it works correctly:

# pbsdsh hostname
n469.dcsc.fysik.dtu.dk
n478.dcsc.fysik.dtu.dk
n477.dcsc.fysik.dtu.dk

The big question is, How did /var/tmp/torque-1.2.0p6-buildroot
ever make it into the pbs_mom executable ?  It's defined in
torque.spec as:
BuildRoot: %{_tmppath}/%{name}-%{version}-buildroot

The RPM BUILD directory (/usr/src/redhat/BUILD/torque-1.2.0p6)
Makefile is correct (selected lines):
prefix =       /usr
sbindir =      ${exec_prefix}/sbin
DEMUX_PATH      = $(sbindir)/pbs_demux

Is this a problem with torque.spec, or with RPM on my system ?
It's a Redhat RHEL 4.0 AS server.  RPM is this version:
rpm-build-4.3.3-9_nonptl.  I attach my torque.spec file for
information.

>>So pbs_demux is actually installed.  It's part of the torque-client
>>RPM, but shouldn't it be part of the torque-mom RPM in stead ?
> 
> Guess that solves that.  You don't have pbs_demux on the nodes because
> my spec file is wrong!  I've never noticed because I've always had
> torque-client installed on the nodes.

Actually, pbs_demux *is* installed on all nodes, because I had
this hunch that the torque-client RPM just possibly might be
required, even on the compute nodes.

So should pbs_demux be moved to the torque-mom RPM, or would other
things break if one doesn't have pbs_demux ?  For example, I don't
plan to install torque-mom on our login nodes.  Would you be so kind
as to offer an updated torque.spec which moves pbs_demux to the
appropriate RPM package ?

> Unfortunately the error message that should have gone to syslog when
> pbs_demux wasn't exec'd was broken.  Funny thing, I just fixed this in
> CVS right after 1.2.0p6 was released.

So what does a poor cluster administrator do here - download the
latest Torque snapshot and build a new RPM with whatever SPEC-file ?

Thanks,
Ole

-- 
Ole Holm Nielsen
Department of Physics, Technical University of Denmark,
Building 307, DK-2800 Kongens Lyngby, Denmark
E-mail: Ole.H.Nielsen at fysik.dtu.dk
Homepage: http://www.fysik.dtu.dk/~ohnielse/
Tel: (+45) 4525 3187 / Fax: (+45) 4593 2399
-------------- next part --------------
%define name torque
%define softversion 1.2.0
%define softrelease p6
%define release 1.fys

%define torque_home %{_localstatedir}/spool/%{name}
%define torquelibdir %{_libdir}/%{name}

# don't strip binaries or make debug packages
%define debug_package %{nil}
%define __os_install_post  /usr/lib/rpm/brp-compress

# Specific customizations at fysik.dtu.dk
%define use_FYS 1

# It's taking too long to figure out why tkx stuff doesn't build on x86_64
# and we don't need it anyways, so just built it conditionally on i386
# update - ydl4.0 works fine, so let's enable it for ppc64!
# ifarch i386 ppc64
%ifarch ppc64
%define use_tcl 1
%define tclconfflags --enable-gui --with-tclx --with-tcl
%else
%define use_tcl 0
%define tclconfflags --disable-gui --without-tclx --without-tcl
%endif

Summary: Tera-scale Open-source Resource and QUEue manager
Name: %{name}
Version: %{softversion}%{softrelease}
Release: %{?snap:snap.%snap.}%{release}

Source0: http://www.clusterresources.com/downloads/torque/%{name}-%{softversion}%{softrelease}%{?snap:-snap.%snap}.tar.gz
Source1: pbs_mom
Source2: pbs_sched
Source3: pbs_server
Source4: pbs_mom.mkconf
Source6: pbs_mom.epilogue
Source9: pbs_mom.health_check_script

License: freely available for download
Group: Applications/System
URL: http://supercluster.org/torque/
BuildRoot: %{_tmppath}/%{name}-%{version}-buildroot
Epoch: 0
Provides: pbs

%if %use_tcl
Requires: tcl, tclx, tk
BuildRequires: tclx, tcl-devel, tk-devel
%endif

Conflicts: pbspro, openpbs, openpbs-oscar



%define shared_description %(echo -e "Torque (Tera-scale Open-source Resource and QUEue manager) is a resource \\nmanager providing control over batch jobs and distributed compute nodes.  \\nTorque is based on OpenPBS version 2.3.12 and incorporates scalability, \\nfault tolerance, and feature extension patches provided by USC, NCSA, OSC, \\nthe U.S. Dept of Energy, Sandia, PNNL, U of Buffalo, TeraGrid, and many \\nother leading edge HPC organizations.\\n")


%description
%shared_description
This package holds just a few shared files and directories.

%prep
%setup -n %{name}-%{softversion}%{softrelease}

%build
CFLAGS="-fPIC %optflags -Wall -std=gnu99  -pedantic -Wno-unused -D_GNU_SOURCE -D_SOCKLEN_T -D__TNANNY"
export CFLAGS

# The config.guess in torque *was* ancient, but let's do it anyways
for i in $(find . -name config.guess -o -name config.sub) ; do
    [ -f /usr/lib/rpm/$(basename $i) ] && \
        %{__rm} -f $i && %{__cp} -fv /usr/lib/rpm/$(basename $i) $i ;
done ;

# I can't get autoconf and friends to work with torque, so we can't use the
# various configure macros
./configure --prefix=%{_prefix} --mandir=%{_mandir} \
  --enable-docs --enable-server --enable-mom --enable-clients --with-scp \
  --enable-syslog --set-server-home=%{torque_home} --set-default-server=localhost \
  --libdir=%{torquelibdir} %{?tclconfflags} --disable-filesync --disable-rpp

%{__make} clean
%{__make} %{_smp_mflags} all


%install
[ "$RPM_BUILD_ROOT" != "/" ] && %{__rm} -rf "$RPM_BUILD_ROOT"

%{makeinstall} libdir=$RPM_BUILD_ROOT%{torquelibdir} PBS_SERVER_HOME=$RPM_BUILD_ROOT%{torque_home}

# Kind of gross, but it's easier to get maui/mpiexec/etc to build with these
%__ln_s . $RPM_BUILD_ROOT%{torquelibdir}/lib
%__ln_s %{_includedir} $RPM_BUILD_ROOT%{torquelibdir}/include


%{__mkdir_p} $RPM_BUILD_ROOT%{_initrddir}
for initscript in pbs_mom pbs_sched pbs_server; do
  %__sed -e 's|^PBS_PREFIX=.*|PBS_PREFIX=%{_prefix}|' \
      -e 's|^PBS_HOME=.*|PBS_HOME=%{torque_home}|' \
      -e 's|^PBS_DAEMON=.*|PBS_DAEMON=%{_sbindir}/'$initscript'|' \
        < %{_sourcedir}/$initscript > $RPM_BUILD_ROOT%{_initrddir}/$initscript
  %__chmod 755 $RPM_BUILD_ROOT%{_initrddir}/$initscript
done

%__install -m755 %{SOURCE4} $RPM_BUILD_ROOT%{torque_home}/mom_priv/mkconf
%__install -m755 %{SOURCE6} $RPM_BUILD_ROOT%{torque_home}/mom_priv/epilogue
%__install -m755 %{SOURCE9} $RPM_BUILD_ROOT%{torque_home}/mom_priv/health_check_script

%post
if %__grep -q PBS /etc/services;then
   : PBS services already installed
else
   cat<<-__EOF__>>/etc/services
	# Standard PBS services
	pbs           15001/tcp           # pbs server (pbs_server)
	pbs           15001/udp           # pbs server (pbs_server)
	pbs_mom       15002/tcp           # mom to/from server
	pbs_mom       15002/udp           # mom to/from server
	pbs_resmom    15003/tcp           # mom resource management requests
	pbs_resmom    15003/udp           # mom resource management requests
	pbs_sched     15004/tcp           # scheduler
	pbs_sched     15004/udp           # scheduler
	__EOF__
fi


%files
%defattr(-, root, root)
%config(noreplace) %{torque_home}/pbs_environment
%config(noreplace) %{torque_home}/server_name
%{torque_home}/aux
%{torque_home}/checkpoint
%{torque_home}/undelivered
%{torque_home}/spool


%package docs
Group: Applications/System
Summary: docs for Torque
Requires: %{name} = %{?epoch:%{epoch}:}%{version}-%{release}
Provides: pbs-docs
%description docs
%shared_description
This package holds the documentation files
%files docs
%defattr(-, root, root)
%{_mandir}/man*/*
%doc doc/admin_guide.ps INSTALL README.torque torque.setup Release_Notes CHANGELOG

%package scheduler
Group: Applications/System
Summary: scheduler part of Torque
Requires: %{name} = %{?epoch:%{epoch}:}%{version}-%{release}
Provides: pbs-scheduler
%description scheduler
%shared_description
This package holds the scheduler

%files scheduler
%defattr(-, root, root)
%{_sbindir}/pbs_sched
%{_initrddir}/pbs_sched
%{torque_home}/sched_priv
%{torque_home}/sched_logs

%post scheduler
/sbin/chkconfig --add pbs_sched

%preun scheduler
[ $1 = 0 ] || exit 0
/sbin/chkconfig --del pbs_sched


%package server
Group: Applications/System
Summary: server part of Torque
Requires: %{name} = %{?epoch:%{epoch}:}%{version}-%{release}
Provides: pbs-server
%description server
%shared_description
This package holds the server

%files server
%defattr(-, root, root)
%{_sbindir}/pbs_server
%{_sbindir}/momctl
%{_initrddir}/pbs_server
%{torque_home}/server_logs
%{torque_home}/server_priv

%post server
/sbin/chkconfig --add pbs_server

%preun server
[ $1 = 0 ] || exit 0
/sbin/chkconfig --del pbs_server


%package mom
Group: Applications/System
Summary: execution part of Torque
Requires: %{name} = %{?epoch:%{epoch}:}%{version}-%{release}
# the prolog/epilog/health scripts use lots of utils
Requires: bc, findutils, perl, psmisc, procps, gawk
Provides: pbs-mom
%description mom
%shared_description
This package holds the execute daemon
%files mom
%defattr(-, root, root)
%{_sbindir}/pbs_mom
%{_initrddir}/pbs_mom
%attr(4755 root root) %{_sbindir}/pbs_rcp
%{torque_home}/mom_priv/*
%{torque_home}/mom_logs

%post mom
/sbin/chkconfig --add pbs_mom

%preun mom
[ $1 = 0 ] || exit 0
/sbin/chkconfig --del pbs_mom


%package client
Group: Applications/System
Summary: client part of Torque
Requires: %{name} = %{?epoch:%{epoch}:}%{version}-%{release}
Provides: pbs-client
%description client
%shared_description
This package holds the command-line client programs
%files client
%defattr(-, root, root)
%{_bindir}/q*
%{_bindir}/chk_tree
%{_bindir}/hostn
%{_bindir}/nqs2pbs
%{_bindir}/pbsdsh
%{_bindir}/pbsnodes
%{_bindir}/printjob
%{_bindir}/tracejob
%attr(4755 root root) %{_sbindir}/pbs_iff
%{_sbindir}/pbs_demux

%package gui
Group: Applications/System
Summary: graphical client part of Torque
Requires: %{name}-client = %{?epoch:%{epoch}:}%{version}-%{release}
Provides: xpbs xpbsmon
%description gui
%shared_description
This package holds the graphical clients
%if %use_tcl
%files gui
%defattr(-, root, root)
%{_bindir}/pbs_wish
%{_bindir}/pbs_tclsh
%{_bindir}/xpbs
%{_bindir}/xpbsmon
%{torquelibdir}/xpbs/*
%{torquelibdir}/xpbsmon/*
%endif


%package localhost
Group: Applications/System
Summary: installs and configures a minimal localhost-only batch queue system
PreReq: pbs-mom pbs-server pbs-client pbs-scheduler
%description localhost
%shared_description
This package installs and configures a minimal localhost-only batch queue system
%files localhost
%defattr(-, root, root)
%post localhost
/sbin/chkconfig pbs_mom on
/sbin/chkconfig pbs_server on
/sbin/chkconfig pbs_sched on
/bin/hostname --long > %{torque_home}/server_priv/nodes
/bin/hostname --long > %{torque_home}/server_name
/bin/hostname --long > %{torque_home}/mom_priv/config
pbs_server -t create
qmgr -c "s s scheduling=true"
qmgr -c "c q batch queue_type=execution"
qmgr -c "s q batch started=true"
qmgr -c "s q batch enabled=true"
qmgr -c "s q batch resources_default.nodes=1"
qmgr -c "s q batch resources_default.walltime=3600"
qmgr -c "s s default_queue=batch"
%{_initrddir}/pbs_mom restart
%{_initrddir}/pbs_sched restart
%{_initrddir}/pbs_server restart
qmgr -c "s n `/bin/hostname --long` state=free" -e

%package devel
Summary: Development tools for programs which will use the %{name} library.
Group: Development/Libraries
Provides: lib%{name}-devel
Requires: %{name} = %{?epoch:%{epoch}:}%{version}-%{release}

%description devel
%shared_description
The %{name}-devel package includes the header files and static libraries
necessary for developing programs which will use the %{name} library.

%files devel
%defattr(-, root, root)
%{torquelibdir}/*.*a
%{torquelibdir}/lib
%{torquelibdir}/include
%{_includedir}/*.h

%clean
[ "$RPM_BUILD_ROOT" != "/" ] && %{__rm} -rf $RPM_BUILD_ROOT


%changelog
* Tue Sep 13 2005 Ole.H.Nielsen at fysik.dtu.dk
- Adapted torque.spec from Garrick Staples <garrick at usc.edu> 


More information about the torqueusers mailing list