[torqueusers] Warewulf NHC 1.2.1 and SC12

David Beer dbeer at adaptivecomputing.com
Thu Nov 8 09:04:58 MST 2012


If you're interested in proactively managing compute nodes' health, I
recommend that you check out this project. Our support team regularly
recommends it to people getting started with node health check scripts and
the results have been very positive. It makes it easy to do a lot of the
most common checks that are important for node health checks, and it makes
it easy to handle some of the common pitfalls (such as a node health
checker that runs for too long).


On Wed, Nov 7, 2012 at 5:07 PM, Michael Jennings <mej at lbl.gov> wrote:

> In preparation for SC12, I've released version 1.2.1 of Warewulf Node
> Health Check (NHC).
> Many thanks to Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> for his
> help with this release!
> The notes on this release are fairly short, so I'll include them here:
>  - Added logrotate script (from Ole)
>  - Added cron wrapper script (from Ole)
>  - Added NHC_AUTH_USERS variable for users allowed to run on node at
>    any time (used by check_ps_userproc_lineage and
>    check_ps_unauth_users).
>  - Fixed some bugs that prevented check_ps_unauth_users from finding
>    TORQUE job files properly and resolving long userids.
>  - Fixed bug where NHC mishandled nodes which were offlined with no
>    note by an operator.
>  - Updated online documentation regarding mismatch between hostname
>    and TORQUE nodename.
> I also wanted to mention that Jackie Scoggins and I will be doing a
> presentation at SuperComputing '12 in the Adaptive Computing booth on
> Tuesday the 13th from 10:30-11:00.  Discussion of new features in the
> 1.2 series will be included along with a brief overview of the
> features and syntax.  We hope to see many of you there!  :-)
> Michael
> --
> Michael Jennings <mej at lbl.gov>
> Senior HPC Systems Engineer
> High-Performance Computing Services
> Lawrence Berkeley National Laboratory
> Bldg 50B-3209E        W: 510-495-2687
> MS 050B-3209          F: 510-486-8615
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

David Beer | Senior Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20121108/0358bf29/attachment.html 

More information about the torqueusers mailing list