[torquedev] automatic MOM restarts for easier upgrades

Garrick Staples garrick at usc.edu
Wed Mar 1 10:37:45 MST 2006

On Wed, Mar 01, 2006 at 05:32:33PM +0100, Lennart Karlsson alleged:
> It would be very nice just to install new pbs_mom binaries and wait
> for the upgrade to progress by itself with your new algorithm!
> For me it would be important that I know that the upgrade will complete
> within at least the remaining walltime of my longest job and that I easily
> can check if it has completed or even get notified when it has completed,
> i.e. when all pbs_moms have upgraded themselves. It would not be nice
> at all to run a mix of two or more versions for a long time, e.g. when I want
> to debug some scheduling problem.
> A pbs_mom that waits for the moment where there are no jobs to run,
> might wait for many weeks in some of our environments. I think that is not good
> enough. It would be good enough for me if it also restarts after finishing
> the current job. Perhaps it does, I did not read your code?

Yes, assuming dedicated nodes, MOM should restart at the end of the
current job.  It will be the first available "safe" moment.

With shared nodes, this modification doesn't buy you anything because
you still need to schedule a downtime.  Or maybe it does, submitting a 1
second job that requests all nodes would get MOM restarted at the
earliest possible time.

> To get a notification when all pbs_moms are restarted would be fine,
> but I can probably do that by myself with some cron script that runs
> momctl for all nodes and tells me when all nodes run the same version.
> (Or complains every day as long as not every node run the same version.)
> (Here it is of course important that CRI continues to include the complete
> and unique version string within all snapshot versions!)

Yes, I'd say that's a job for cron.

> To force a restart of a pbs_mom, you would probably only need to change
> the mtime, e.g. with a 'pdsh -a touch /usr/pbs/sbin/pbs_mom'. It might
> be so simple, that I would prefer to use it also for any config file update.

Config file changes don't need a daemon restart, everything can directly
changed with momctl.

> Would you propose that pbs_server is upgraded before upgrading the pbs_moms,
> or the other way round?

pbs_server should be restarted first.

Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20060301/c3d62fbd/attachment.bin

More information about the torquedev mailing list