[torqueusers] Two configs for Warewulf NHC 1.2.2 ?

Michael Jennings mej at lbl.gov
Tue Mar 26 11:59:56 MDT 2013


On Tuesday, 26 March 2013, at 07:17:13 (-0700),
Grigory Shamov wrote:

> We actually have similar problem of check_ps checks being slow.  I
> thought of having two instances of NHC, one fast for Torque, that
> would report "ERROR", offline nodes etc. but not involving the
> slower tests, and another running as a cron job, thus not blocking
> PBS_MOMS and allowing for the slow checks, perhaps with a larger
> timeout, just to check for runaways and unauthorized users.
> 
> However, on a first glance it is not possible? As the CONFFILE gets
> read in /etc/sysconfig/nhc only.  Which is system wide.  Can there
> be a way to specify different configs for NHC to run somehow
> (command line)?

/etc/sysconfig/nhc is parsed by bash as a shell script, so it can have
logic in it too.  I'd recommend either setting an environment variable
in your crontab or using one that's already there (and which is not
set by TORQUE).

For example, if in /etc/crontab you have:

    NHC_MODE=cron
    */5 * * * * root /usr/sbin/nhc

Then, in /etc/sysconfig/nhc, put:

    if [ -n "$NHC_MODE" -a "$NHC_MODE" = "cron" ]; then
        CONFFILE=/etc/nhc/nhc-cron.conf
    fi

Then put your heavier-weight tests in /etc/nhc/nhc-cron.conf and the
TORQUE tests in the standard /etc/nhc/nhc.conf.

As an alternative, you may want to consider using "detached mode" if
you haven't already.  More details are available here:
http://warewulf.lbl.gov/trac/wiki/Node%20Health%20Check#DetachedMode

Hope that helps!
Michael

-- 
Michael Jennings <mej at lbl.gov>
Senior HPC Systems Engineer
High-Performance Computing Services
Lawrence Berkeley National Laboratory
Bldg 50B-3209E        W: 510-495-2687
MS 050B-3209          F: 510-486-8615


More information about the torqueusers mailing list