Workaround Re: [torqueusers] Non-cummulative pbsnodes -o command

James J Coyle jjc at iastate.edu
Thu Feb 14 17:04:21 MST 2008


John,

  I downloaded source from cluster resources, then 
tar -zxf ...
and
./configure; gmake; gmake install.

  I just offered the script as a workaround, and understood that 
this would not scale to thousands of nodes. (I've got 144 on the
largest cluster I manage, but run on clusters with 1000's of nodes.)

  Workarounds are expected in my environment, as a way to 
take the pressure off while a proper fix is made. 

  A workaround in no way implies that a fix is not needed
but takes the pressure off so that a proper fix can be applied. 


-------------------------------------------------------------

  I get 

$ /usr/local/bin/pbsnodes -l
node140              offline
node141              offline
node144              offline

$ /usr/local/bin/pbsnodes -o node143

$ /usr/local/bin/pbsnodes -l
node140              offline
node141              offline
node143              offline
node144              offline


$ qmgr -c 'p s' | grep version
set server pbs_version = 2.1.2

$ ls -la /usr/local/bin/pbsnodes
-rwxr-xr-x  1 root root 50261 Oct  3 10:35 /usr/local/bin/pbsnodes

 - James Coyle


> Hello James
> 
> Scripting wrappers is probably a suitable solution for the current cluster
> size but I suspect that it wouldn't take long to exceed the line limitations
> of a given command.   The clusters at this site are only 128 nodes but at my
> previous job, we had clusters of about 6,000 nodes, quite a change to go
> from a large production cluster to a small development cluster.   My
> previous employer embedded the scheduling functionality into their own
> application but my current employer wants a more generic HPC environment.
> 
> Where did you get your Torque 2.1.2 installation?   Binary rpm's? Source?
> Cluster Resources? Or elsewheres?   I had compiled from source downloaded
> from Cluster Resources using the defaults from the ./configure script.
> 
> Regards,
> John
> 
> 
> On 2/14/08 12:32 PM, "James J Coyle" <jjc at iastate.edu> wrote:
> 
> > John,
> > 
> >    I don't get this behavior (version 2.1.2), but it world be quite annoying
> > if I did.
> > 
> >    If you'd like a fairly easy workaround, put the following script
> > in a file ahead of /usr/local/bin in your PATH and name it pbsnodes,
> > E.g. call it /local/bin/pbsnodes
> > the issue chmod u+x /local/bin/pbsnodes
> > and then (if your in the csh or tcsh)
> > setenv PATH /local/bin:${PATH}
> > rehash  
> > 
> > Now pbsnodes -o
> > should work as you want it to, as pbsnodes with no -o
> > passes unchanged to /usr/local/bin/pbsnodes
> > 
> > An easy mod makes this work with -d
> > once that becomes available.
> > 
> > 
> > 
> > #!/bin/ksh
> > 
> > PBSDIR=/usr/local/bin
> > 
> > OFLAG_PRESENT=`echo $* | grep '\-o'`
> > if [ -n "${OFLAG_PRESENT}" ] ; then
> >   ALREADY_OFFLINE="`${PBSDIR}/pbsnodes -l | awk '/offline/ {print $1}'`"
> >   ${PBSDIR}/pbsnodes $* ${ALREADY_OFFLINE}
> > else
> >   ${PBSDIR}/pbsnodes $*
> > fi
> > 
> > 
> > 
> 




More information about the torqueusers mailing list