[torqueusers] Re: [Mauiusers] node health check

'Garrick Staples' garrick at clusterresources.com
Tue Nov 14 22:41:03 MST 2006


On Tue, Nov 14, 2006 at 03:53:11PM -0800, Alexander Saydakov alleged:
> > -----Original Message-----
> > From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
> > bounces at supercluster.org] On Behalf Of 'Garrick Staples'
> > Sent: Tuesday, November 14, 2006 11:52 AM
> > To: torqueusers at supercluster.org
> > Subject: [torqueusers] Re: [Mauiusers] node health check
> > 
> > In MOM's config, $down_on_error can be used to have the MOM set itself
> > as "down" if there is an ERROR message from the health check script.
> 
> I would suggest taking advantage of the exit code instead of relying on the
> message to begin with ERROR. What if things are so out of hand that health
> script can not even execute? I understand that server can only read the
> message from mom, but mom is in a better position because it has the exit
> code of the script. Why not take non-zero exit code as an indication of the
> problem?

That is probably a good idea.  We keep the current behaviour, but in
addition a non-zero exit will create a "ERROR health check failed"
message?




More information about the torqueusers mailing list