[torqueusers] strategies for bad nodes

Rick McKay rmckay at adaptivecomputing.com
Tue Apr 17 08:14:52 MDT 2012

Hello William,

Michael Jennings at Lawrence Berkley just did a great presentation about
the Node Health Check subproject of Warewulf that you might want to look
into that, too. It's an excellent expansion of what's in the Adaptive
TORQUE docs. It's well-documented, implemented almost entirely in bash, and
easy to extend for you specific needs.




On Tue, Apr 17, 2012 at 7:02 AM, Edsall, William (WJ) <WJEdsall at dow.com>wrote:

>  Hello list,****
> I’m looking for ideas on how to prevent jobs from going to ‘bad’ nodes.
> There are a small handful of items which define a bad node for us such as
> ypbind not bound, maybe /scr is full, etc. We need to be able to customize
> this list.****
> ** **
> What might be built into torque to achieve this? It would be ideal if the
> node was not only passed by for a job but even offlined with a comment.***
> *
> ** **
> Thanks,****
> William****
> ** **
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120417/a0a51ed7/attachment-0001.html 

More information about the torqueusers mailing list