[torqueusers] strategies for bad nodes
Rick McKay
rmckay at adaptivecomputing.com
Tue Apr 17 08:14:52 MDT 2012
Hello William,
Michael Jennings at Lawrence Berkley just did a great presentation about
the Node Health Check subproject of Warewulf that you might want to look
into that, too. It's an excellent expansion of what's in the Adaptive
TORQUE docs. It's well-documented, implemented almost entirely in bash, and
easy to extend for you specific needs.
http://warewulf.lbl.gov/trac/wiki/Node%20Health%20Check
Regards,
Rick
On Tue, Apr 17, 2012 at 7:02 AM, Edsall, William (WJ) <WJEdsall at dow.com>wrote:
> Hello list,****
>
> I’m looking for ideas on how to prevent jobs from going to ‘bad’ nodes.
> There are a small handful of items which define a bad node for us such as
> ypbind not bound, maybe /scr is full, etc. We need to be able to customize
> this list.****
>
> ** **
>
> What might be built into torque to achieve this? It would be ideal if the
> node was not only passed by for a job but even offlined with a comment.***
> *
>
> ** **
>
> Thanks,****
>
> William****
>
> ** **
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120417/a0a51ed7/attachment-0001.html
More information about the torqueusers
mailing list