[torquedev] [Bug 124] pbs_mom healthcheck scripts should run from a forked process

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Thu Apr 21 17:27:46 MDT 2011


http://www.clusterresources.com/bugzilla/show_bug.cgi?id=124

Michael Jennings <mej at lbl.gov> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mej at lbl.gov

--- Comment #1 from Michael Jennings <mej at lbl.gov> 2011-04-21 17:27:45 MDT ---
I'm currently working on a brand new framework and implementation for node
health checks here at LBL, so related topics are of particular interest to me
right now.

Seems to me like the simplest solution to this problem is to make sure your
node health check script doesn't hang.  There are multiple facets to this
approach:  (1) fork only when absolutely necessary, and (2) set an alarm on the
script that will kill it after a relatively brief timeout.

You've already pointed out just a few of the concurrency challenges associated
with trying to background the node health check.  Brings to mind the old adage
about an ounce of prevention....  :-)

-- 
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the torquedev mailing list