[torquedev] automatic MOM restarts for easier upgrades

Lennart Karlsson Lennart.Karlsson at nsc.liu.se
Wed Mar 1 09:32:33 MST 2006


Garrick,

You wrote:
> I'd like some feedback on adding a safe and automatic restart of MOM
> when it finds that its binary on disk has changed.  The attached patch
> does the following:
> 
> - Find the pbs_mom binary, stat it, and save the path, initial mtime, and
>   the PATH env variable.
> - Add a config and RM request called "enablemomrestart".  If enabled,
>   then MOM will check the mtime of its binary when no jobs are running.  
>   If defaults to off.
> - If the new mtime doesn't match the initial mtime, it will re-exec
>   itself preserving the command-line args, argv[0], and $PATH.
> 
> That's 4 pre-conditions for a re-exec: the binary was found,
> enablemomrestart must be true, the binary's mtime has changed, and no
> jobs are running.
> 
> The admin can enable $enablemomrestart in mom_priv/config and MOM will
> always re-exec itself whenever the binary has changed.
> 
> Or the admin can leave it disabled in config, and only enable it as
> desired with 'momctl -q enablemomrestart=1 -h nodeXXXX'.
> 
> 
> Some complications arose from trying to find the pbs_mom binary, and
> preserving the process name across re-execs, but I think I have
> everything working as one would expect.
> 
> Some possibilities that I decided not to implement: bypassing the mtime
> check when using momctl (forcing a restart), and allowing the admin to
> specify a new path to pbs_mom.  The former, though obvious, doesn't
> actually seem useful.  I can imagine the later being useful, but
> probably not for me.


It would be very nice just to install new pbs_mom binaries and wait
for the upgrade to progress by itself with your new algorithm!

For me it would be important that I know that the upgrade will complete
within at least the remaining walltime of my longest job and that I easily
can check if it has completed or even get notified when it has completed,
i.e. when all pbs_moms have upgraded themselves. It would not be nice
at all to run a mix of two or more versions for a long time, e.g. when I want
to debug some scheduling problem.

A pbs_mom that waits for the moment where there are no jobs to run,
might wait for many weeks in some of our environments. I think that is not good
enough. It would be good enough for me if it also restarts after finishing
the current job. Perhaps it does, I did not read your code?

To get a notification when all pbs_moms are restarted would be fine,
but I can probably do that by myself with some cron script that runs
momctl for all nodes and tells me when all nodes run the same version.
(Or complains every day as long as not every node run the same version.)
(Here it is of course important that CRI continues to include the complete
and unique version string within all snapshot versions!)

To force a restart of a pbs_mom, you would probably only need to change
the mtime, e.g. with a 'pdsh -a touch /usr/pbs/sbin/pbs_mom'. It might
be so simple, that I would prefer to use it also for any config file update.

Would you propose that pbs_server is upgraded before upgrading the pbs_moms,
or the other way round?

Many thanks for your great work with Torque!
-- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
   National Supercomputer Centre in Linkoping, Sweden
   http://www.nsc.liu.se




More information about the torquedev mailing list