[torqueusers] Multiple moms
Charles Johnson
charles.johnson at accre.vanderbilt.edu
Thu May 22 15:24:56 MDT 2008
On May 22, 2008, at 4:09 PM, David Singleton wrote:
> Glen Beane wrote:
>> On Thu, May 22, 2008 at 10:17 AM, Charles Johnson <
>> charles.johnson at accre.vanderbilt.edu> wrote:
>>> We use nagios to monitor an array of situations on our cluster. We
>>> have had
>>> an oddity show up. We monitor the number of pbs_mom's running on a
>>> given
>>> node. Nagios was set to report more than one mom running on a
>>> given node. We
>>> have occasionally seen as many as three. Moreover, a few of the
>>> mom's have
>>> user uid's rather than root, even though only root can start a
>>> mom. We have
>>> altered nagios to ignore multiple mom's less than 5.
>>>
>>> Does anyone have an explanation, or better yet point me to
>>> appropriate
>>> documentation.
>> I can't point you to any documentation, but this is normal
>> behavior. In
>> several cases the mom will fork a child process to do some task
>> that may
>> take a while to complete so the parent mom can remain responsive.
>> The moms
>> that fork to the users uid are usually copying output files back to
>> the user
>> home directory.
>
> And I think you will see a couple of extra moms for each qsub -I
> job but, in this case, they are owned by root.
>
Thanks to all who responded. It was most helpful.
Cheers--
Charles
---
Charles Johnson
Advanced Computing Center for Research and Education
Vanderbilt University
charles.johnson at accre.vanderbilt.edu
Office: 615-343-2776
Cell: 615-478-8799
More information about the torqueusers
mailing list