On Tue, Dec 09, 2008 at 09:36:11PM +0100, Bogdan Costescu alleged:
> >Stopping these daemons are a fairly big event that can't always be 
> >done without damaging production environments.
> Removing packages is a big event in a production environment too.
> >Here are two possible scenerios that I want to avoid:
> I'm sorry to be so harsh, but both are really not things that should 
> happen in a production environment. If you want to experiment with 
> different Torque versions from different sources you don't do it on a 
> production cluster, but on a test one (even if it's a virtual one).

It doesn't matter if you think they should happen or not.  In the real world,
they do happen.  Most places don't have the time or resources for test
clusters.  Most admins make mistakes at times.

> >During uninstall, there is no way to determine that the running 
> >binary is the same as the one that is being uninstalled.
> There's a simple way to solve this: save the PID of the daemon started 
> by the init.d script; if the PID doesn't exist anymore, don't kill the 
> corresponding pbs_* process because it was not the one started by this 
> script. If an installation manages to use the same PID file or even 
> overwrites the init.d script, then this is the fault of the stupid 
> sysadmin not of the package.

If it gets overwritten, then let's help the admin by not shutting down his or
her daemon.

> >The admin could be temporarily uninstalling a package for various 
> >reasons.
> Never heard of this one before. Upon uninstalling the package all 
> Torque utils disappear, leaving users and admins without any way to 
> interact with the daemon left running - this is not something that 
> should happen in a production environment.

You don't need the utils for a few seconds while the admin is futzing around
with packages.  But you do want the daemon to stay up.

And again, it doesn't matter if you think it shouldn't happen or not, in
reality it does happen.

I don't want to shoot those people in the foot.

