[torqueusers] Restarting pbs_mom question

Garrick Staples garrick at usc.edu
Tue Jun 24 09:44:42 MDT 2008


On Tue, Jun 24, 2008 at 10:38:58AM -0400, Rob Lines alleged:
> We need to restart the pbs_mom to implement the fix found here
> http://www.clusterresources.com/pipermail/torqueusers/2007-March/005360.html.
> We have never restarted the pbs_mom process while there were jobs running on
> a node (atleast ones that we cared about keeping) so I am wondering what the
> results would be of restarting them on machines with active jobs.  We have
> restarted the maui process before with no problem but its' part in the
> process is different.
> 
> We had the backup plan of just draining all the nodes then restarting
> pbs_mom on any of them that don't have jobs currently then putting those
> nodes back in service then once the other nodes that have current jobs
> finish we would restart their pbs_mom and put them back in service.  I had
> just hoped to avoid that because it would mean I have to pay attention to
> the them and some of the jobs that are running currently are multi day runs.

Are you running 2.1.0p0 or later?  If so, just do 'momctl -q enablemomrestart=1
-h $momhost' and pbs_mom will re-exec itself when the last job exits.

If the node has multiple jobs and will never be idle, then set a 5 minute
reservation in maui.

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080624/62920e44/attachment.bin


More information about the torqueusers mailing list