[torqueusers] strategies for bad nodes

Rick McKay rmckay at adaptivecomputing.com
Tue Apr 17 08:14:52 MDT 2012


Hello William,

Michael Jennings at Lawrence Berkley just did a great presentation about
the Node Health Check subproject of Warewulf that you might want to look
into that, too. It's an excellent expansion of what's in the Adaptive
TORQUE docs. It's well-documented, implemented almost entirely in bash, and
easy to extend for you specific needs.

http://warewulf.lbl.gov/trac/wiki/Node%20Health%20Check

Regards,

Rick


On Tue, Apr 17, 2012 at 7:02 AM, Edsall, William (WJ) <WJEdsall at dow.com>wrote:

>  Hello list,****
>
> I’m looking for ideas on how to prevent jobs from going to ‘bad’ nodes.
> There are a small handful of items which define a bad node for us such as
> ypbind not bound, maybe /scr is full, etc. We need to be able to customize
> this list.****
>
> ** **
>
> What might be built into torque to achieve this? It would be ideal if the
> node was not only passed by for a job but even offlined with a comment.***
> *
>
> ** **
>
> Thanks,****
>
> William****
>
> ** **
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120417/a0a51ed7/attachment-0001.html 


More information about the torqueusers mailing list