[torqueusers] Multiple moms

Charles Johnson charles.johnson at accre.vanderbilt.edu
Thu May 22 15:24:56 MDT 2008


On May 22, 2008, at 4:09 PM, David Singleton wrote:

> Glen Beane wrote:
>> On Thu, May 22, 2008 at 10:17 AM, Charles Johnson <
>> charles.johnson at accre.vanderbilt.edu> wrote:
>>> We use nagios to monitor an array of situations on our cluster. We  
>>> have had
>>> an oddity show up. We monitor the number of pbs_mom's running on a  
>>> given
>>> node. Nagios was set to report more than one mom running on a  
>>> given node. We
>>> have occasionally seen as many as three. Moreover, a few of the  
>>> mom's have
>>> user uid's rather than root, even though only root can start a  
>>> mom. We have
>>> altered nagios to ignore multiple mom's less than 5.
>>>
>>> Does anyone have an explanation, or better yet point me to  
>>> appropriate
>>> documentation.
>> I can't point you to any documentation, but this is normal  
>> behavior.  In
>> several cases the mom will fork a child process to do some task  
>> that may
>> take a while to complete so the parent mom can remain responsive.   
>> The moms
>> that fork to the users uid are usually copying output files back to  
>> the user
>> home directory.
>
> And I think you will see a couple of extra moms for each qsub -I
> job but, in this case, they are owned by root.
>

Thanks to all who responded. It was most helpful.

Cheers--

Charles
---
Charles Johnson
Advanced Computing Center for Research and Education
Vanderbilt University
charles.johnson at accre.vanderbilt.edu
Office: 615-343-2776
Cell: 615-478-8799






More information about the torqueusers mailing list