[torqueusers] Restarting pbs_mom question
garrick at usc.edu
Tue Jun 24 09:44:42 MDT 2008
On Tue, Jun 24, 2008 at 10:38:58AM -0400, Rob Lines alleged:
> We need to restart the pbs_mom to implement the fix found here
> We have never restarted the pbs_mom process while there were jobs running on
> a node (atleast ones that we cared about keeping) so I am wondering what the
> results would be of restarting them on machines with active jobs. We have
> restarted the maui process before with no problem but its' part in the
> process is different.
> We had the backup plan of just draining all the nodes then restarting
> pbs_mom on any of them that don't have jobs currently then putting those
> nodes back in service then once the other nodes that have current jobs
> finish we would restart their pbs_mom and put them back in service. I had
> just hoped to avoid that because it would mean I have to pay attention to
> the them and some of the jobs that are running currently are multi day runs.
Are you running 2.1.0p0 or later? If so, just do 'momctl -q enablemomrestart=1
-h $momhost' and pbs_mom will re-exec itself when the last job exits.
If the node has multiple jobs and will never be idle, then set a 5 minute
reservation in maui.
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
Please avoid sending me Word or PowerPoint attachments.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080624/62920e44/attachment.bin
More information about the torqueusers