[torqueusers] what causes many defunct pbs_mom processes

Moye,Roger V RVMoye at mdanderson.org
Thu Dec 19 10:24:48 MST 2013


Suddenly this week we have had a storm of problems with defunct pbs_mom processes as shown here:

root      6811  4589  0 11:19 ?        00:00:00 [pbs_mom] <defunct>

The particular node from where this was taken has only been up 45 minutes so the problem occurred almost immediately upon new jobs running on this node.  At present there are 70 of these defunct processes.    I am seeing this on multiple nodes.

We are using version 4.2.3.1 with Maui 3.3.1 on RHEL 6.4.

Does anyone know what causes these to occur?

Many thanks!
-Roger

-----------------------------------------------------------
Roger V. Moye
Systems Analyst III
XSEDE Campus Champion
University of Texas - MD Anderson Cancer Center
Division of Quantitative Sciences
Pickens Academic Tower - FCT4.6109
Houston, Texas
(713) 792-2134
-----------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131219/390536cd/attachment.html 


More information about the torqueusers mailing list