[torqueusers] Restarting pbs_mom question

Steve Young chemadm at hamilton.edu
Tue Jun 24 09:01:15 MDT 2008


Hi,
	I've wondered this but haven't had to do it much. Looking at the man  
page for pbs_mom I see:

-p              Specifies  the  impact  on  jobs  which  were in  
execution when the mini-server shut down.  On any
                        restart of MOM, the new mini-server will not  
be the parent of any running jobs, MOM has lost  con-
                        trol  of her offspring (not a new situation  
for a mother).  With the -p option, Mom will allow the
                        jobs to continue to run and monitor them  
indirectly via polling.  The -p option is mutually exclu-
                        sive with the -r option.

would this do it? And I assume this means the pbs_mom would be the  
parent for new jobs coming to the node?

-Steve

On Jun 24, 2008, at 10:38 AM, Rob Lines wrote:

> We need to restart the pbs_mom to implement the fix found here  
> http://www.clusterresources.com/pipermail/torqueusers/2007-March/ 
> 005360.html.  We have never restarted the pbs_mom process while  
> there were jobs running on a node (atleast ones that we cared about  
> keeping) so I am wondering what the results would be of restarting  
> them on machines with active jobs.  We have restarted the maui  
> process before with no problem but its' part in the process is  
> different.
>
> We had the backup plan of just draining all the nodes then  
> restarting pbs_mom on any of them that don't have jobs currently  
> then putting those nodes back in service then once the other nodes  
> that have current jobs finish we would restart their pbs_mom and  
> put them back in service.  I had just hoped to avoid that because  
> it would mean I have to pay attention to the them and some of the  
> jobs that are running currently are multi day runs.
>
> Thanks for the help,
> Rob
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080624/c5c71a9d/attachment-0001.html


More information about the torqueusers mailing list