[torqueusers] Multiple moms

Charles Johnson charles.johnson at accre.vanderbilt.edu
Thu May 22 15:24:56 MDT 2008

On May 22, 2008, at 4:09 PM, David Singleton wrote:

> Glen Beane wrote:
>> On Thu, May 22, 2008 at 10:17 AM, Charles Johnson <
>> charles.johnson at accre.vanderbilt.edu> wrote:
>>> We use nagios to monitor an array of situations on our cluster. We  
>>> have had
>>> an oddity show up. We monitor the number of pbs_mom's running on a  
>>> given
>>> node. Nagios was set to report more than one mom running on a  
>>> given node. We
>>> have occasionally seen as many as three. Moreover, a few of the  
>>> mom's have
>>> user uid's rather than root, even though only root can start a  
>>> mom. We have
>>> altered nagios to ignore multiple mom's less than 5.
>>> Does anyone have an explanation, or better yet point me to  
>>> appropriate
>>> documentation.
>> I can't point you to any documentation, but this is normal  
>> behavior.  In
>> several cases the mom will fork a child process to do some task  
>> that may
>> take a while to complete so the parent mom can remain responsive.   
>> The moms
>> that fork to the users uid are usually copying output files back to  
>> the user
>> home directory.
> And I think you will see a couple of extra moms for each qsub -I
> job but, in this case, they are owned by root.

Thanks to all who responded. It was most helpful.


Charles Johnson
Advanced Computing Center for Research and Education
Vanderbilt University
charles.johnson at accre.vanderbilt.edu
Office: 615-343-2776
Cell: 615-478-8799

More information about the torqueusers mailing list