[torquedev] enablemomrestart with pbs_mom symlinks
Craig West
cwest at astro.umass.edu
Mon Jul 7 11:43:09 MDT 2008
I've just been through the process of updating my nodes to 2.3.1 (from
2.3.0). During the process I attempted to use the enablemomrestart, but
without success. I think the problem is that the pbs_mom is not looking
at the symlink, but rather directly at the executable. I installed the
new torque (in a new directory), then changed the symlinks in
/usr/local/sbin to point to that new directory, but the pbs_mom didn't
seem to notice the new symlink. In the end I needed to restart every
pbs_mom manually (not a big deal with scripts), and I've offlined nodes
that still have jobs waiting to complete so that I can restart the
processes there as well.
Is there anyway that the pbs_mom process can work with the
enablemomrestart in the sort of environment I have?
There was the suggestion of allowing an admin to send a new path to the
pbs_mom. This sounds like something that could work for me.
http://www.clusterresources.com/pipermail/torquedev/2006-March/000152.html
Did this get implemented, and left undocumented. If not implemented,
would it be possible to get this added?
The other option I have is to copy the pbs_mom into /usr/local/sbin and
use it from there. I would rather not do it this way.
I have the following entries in the pbs_mom logs when it starts. This is
where I got the hint that it wasn't looking at the symlink, but rather
at what it linked to.
# which pbs_mom
/usr/local/sbin/pbs_mom
<prior to upgrade>
pbs_mom;Svr;setup_program_environment;MOM executable path and mtime at
launch: /nfs/local/amd/torque-2.3.0/sbin/pbs_mom 1205430201
<after upgrade and manual restart>
pbs_mom;Svr;setup_program_environment;MOM executable path and mtime at
launch: /nfs/local/amd/torque-2.3.1/sbin/pbs_mom 1215436812
Cheers,
Craig.
More information about the torquedev
mailing list