[torqueusers] Two configs for Warewulf NHC 1.2.2 ?

Grigory Shamov gas5x at yahoo.com
Wed Mar 27 09:48:28 MDT 2013


Dear Michael,

Thanks for the answer! I looked into the sources and took a different route instead of adding new logic. 

Since all the configs are/can be made to depend on $NAME, I've created a symlink to nhc with a different name, nhc-cron, and populated the /etc/sysconfig/nhc-cron with the values for config paths, logs etc. -- it worked.  BASH is a really powerful thing!

--
Grigory Shamov
University of Manitoba


 

--- On Tue, 3/26/13, Michael Jennings <mej at lbl.gov> wrote:

> From: Michael Jennings <mej at lbl.gov>
> Subject: Re: Two configs for  Warewulf NHC 1.2.2 ?
> To: "Grigory Shamov" <gas5x at yahoo.com>
> Cc: "Torque Users Mailing List" <torqueusers at supercluster.org>
> Date: Tuesday, March 26, 2013, 10:59 AM
> On Tuesday, 26 March 2013, at
> 07:17:13 (-0700),
> Grigory Shamov wrote:
> 
> > We actually have similar problem of check_ps checks
> being slow.  I
> > thought of having two instances of NHC, one fast for
> Torque, that
> > would report "ERROR", offline nodes etc. but not
> involving the
> > slower tests, and another running as a cron job, thus
> not blocking
> > PBS_MOMS and allowing for the slow checks, perhaps with
> a larger
> > timeout, just to check for runaways and unauthorized
> users.
> > 
> > However, on a first glance it is not possible? As the
> CONFFILE gets
> > read in /etc/sysconfig/nhc only.  Which is system
> wide.  Can there
> > be a way to specify different configs for NHC to run
> somehow
> > (command line)?
> 
> /etc/sysconfig/nhc is parsed by bash as a shell script, so
> it can have
> logic in it too.  I'd recommend either setting an
> environment variable
> in your crontab or using one that's already there (and which
> is not
> set by TORQUE).
> 
> For example, if in /etc/crontab you have:
> 
>     NHC_MODE=cron
>     */5 * * * * root /usr/sbin/nhc
> 
> Then, in /etc/sysconfig/nhc, put:
> 
>     if [ -n "$NHC_MODE" -a "$NHC_MODE" = "cron" ];
> then
>         CONFFILE=/etc/nhc/nhc-cron.conf
>     fi
> 
> Then put your heavier-weight tests in /etc/nhc/nhc-cron.conf
> and the
> TORQUE tests in the standard /etc/nhc/nhc.conf.
> 
> As an alternative, you may want to consider using "detached
> mode" if
> you haven't already.  More details are available here:
> http://warewulf.lbl.gov/trac/wiki/Node%20Health%20Check#DetachedMode
> 
> Hope that helps!
> Michael
> 
> -- 
> Michael Jennings <mej at lbl.gov>
> Senior HPC Systems Engineer
> High-Performance Computing Services
> Lawrence Berkeley National Laboratory
> Bldg 50B-3209E        W: 510-495-2687
> MS 050B-3209          F:
> 510-486-8615
> 


More information about the torqueusers mailing list