[torquedev] enablemomrestart with pbs_mom symlinks

Craig West cwest at astro.umass.edu
Mon Jul 7 11:43:09 MDT 2008


I've just been through the process of updating my nodes to 2.3.1 (from 
2.3.0). During the process I attempted to use the enablemomrestart, but 
without success. I think the problem is that the pbs_mom is not looking 
at the symlink, but rather directly at the executable. I installed the 
new torque (in a new directory), then changed the symlinks in 
/usr/local/sbin to point to that new directory, but the pbs_mom didn't 
seem to notice the new symlink. In the end I needed to restart every 
pbs_mom manually (not a big deal with scripts), and I've offlined nodes 
that still have jobs waiting to complete so that I can restart the 
processes there as well.

Is there anyway that the pbs_mom process can work with the 
enablemomrestart in the sort of environment I have?
There was the suggestion of allowing an admin to send a new path to the 
pbs_mom. This sounds like something that could work for me.
    
http://www.clusterresources.com/pipermail/torquedev/2006-March/000152.html
Did this get implemented, and left undocumented. If not implemented, 
would it be possible to get this added?

The other option I have is to copy the pbs_mom into /usr/local/sbin and 
use it from there. I would rather not do it this way.

I have the following entries in the pbs_mom logs when it starts. This is 
where I got the hint that it wasn't looking at the symlink, but rather 
at what it linked to.
# which pbs_mom
/usr/local/sbin/pbs_mom

<prior to upgrade>
pbs_mom;Svr;setup_program_environment;MOM executable path and mtime at 
launch: /nfs/local/amd/torque-2.3.0/sbin/pbs_mom 1205430201

<after upgrade and manual restart>
pbs_mom;Svr;setup_program_environment;MOM executable path and mtime at 
launch: /nfs/local/amd/torque-2.3.1/sbin/pbs_mom 1215436812


Cheers,
Craig.


More information about the torquedev mailing list