[torqueusers] what causes many defunct pbs_mom processes

Moye,Roger V RVMoye at mdanderson.org
Thu Dec 19 12:39:55 MST 2013


Does anyone know what triggers this problem?  I would prefer not to do an emergency upgrade right before the holidays but right now I am having to babysit the cluster as the pbs_mom slowly degrades on many of the nodes.   So if there is a way to avoid the problem in the short-term I would like to pursue that strategy.

-Roger

-----------------------------------------------------------
Roger V. Moye
Systems Analyst III
XSEDE Campus Champion
University of Texas - MD Anderson Cancer Center
Division of Quantitative Sciences
Pickens Academic Tower - FCT4.6109
Houston, Texas
(713) 792-2134
-----------------------------------------------------------

From: Jeffrey Lang [mailto:jrlang at uwyo.edu]
Sent: Thursday, December 19, 2013 11:30 AM
To: Torque Users Mailing List
Cc: Moye,Roger V
Subject: Re: [torqueusers] what causes many defunct pbs_mom processes

Roger

  This was a known bug in 4.2.3.1 maybe other old versions of torque.  We had this problem and upgraded to 4.2.6 (the latest torque) and the problem seems to have been fixed.

On 12/19/2013 10:24 AM, Moye,Roger V wrote:

Suddenly this week we have had a storm of problems with defunct pbs_mom processes as shown here:

root      6811  4589  0 11:19 ?        00:00:00 [pbs_mom] <defunct>

The particular node from where this was taken has only been up 45 minutes so the problem occurred almost immediately upon new jobs running on this node.  At present there are 70 of these defunct processes.    I am seeing this on multiple nodes.

We are using version 4.2.3.1 with Maui 3.3.1 on RHEL 6.4.

Does anyone know what causes these to occur?

Many thanks!
-Roger

-----------------------------------------------------------
Roger V. Moye
Systems Analyst III
XSEDE Campus Champion
University of Texas - MD Anderson Cancer Center
Division of Quantitative Sciences
Pickens Academic Tower - FCT4.6109
Houston, Texas
(713) 792-2134
-----------------------------------------------------------





_______________________________________________

torqueusers mailing list

torqueusers at supercluster.org<mailto:torqueusers at supercluster.org>

http://www.supercluster.org/mailman/listinfo/torqueusers

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131219/a6a85877/attachment.html 


More information about the torqueusers mailing list