[torqueusers] Re: [Mauiusers] node health check

Alexander Saydakov saydakov at yahoo-inc.com
Wed Nov 15 10:35:47 MST 2006


> -----Original Message-----
> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
> bounces at supercluster.org] On Behalf Of 'Garrick Staples'
> Sent: Tuesday, November 14, 2006 9:41 PM
> To: torqueusers at supercluster.org
> Subject: Re: [torqueusers] Re: [Mauiusers] node health check
> 
> On Tue, Nov 14, 2006 at 03:53:11PM -0800, Alexander Saydakov alleged:
> > > -----Original Message-----
> > > From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
> > > bounces at supercluster.org] On Behalf Of 'Garrick Staples'
> > > Sent: Tuesday, November 14, 2006 11:52 AM
> > > To: torqueusers at supercluster.org
> > > Subject: [torqueusers] Re: [Mauiusers] node health check
> > >
> > > In MOM's config, $down_on_error can be used to have the MOM set itself
> > > as "down" if there is an ERROR message from the health check script.
> >
> > I would suggest taking advantage of the exit code instead of relying on
> the
> > message to begin with ERROR. What if things are so out of hand that
> health
> > script can not even execute? I understand that server can only read the
> > message from mom, but mom is in a better position because it has the
> exit
> > code of the script. Why not take non-zero exit code as an indication of
> the
> > problem?
> 
> That is probably a good idea.  We keep the current behaviour, but in
> addition a non-zero exit will create a "ERROR health check failed"
> message?

We can still use the output of the script (STDERR or STDOUT?) as a message,
adding ERROR if it does not start with it.




More information about the torqueusers mailing list