[torqueusers] strategies for bad nodes

Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu
Tue Apr 17 07:25:16 MDT 2012

hello william,

On Tue, Apr 17, 2012 at 9:02 AM, Edsall, William (WJ) <WJEdsall at dow.com> wrote:
> Hello list,
> I’m looking for ideas on how to prevent jobs from going to ‘bad’ nodes.
> There are a small handful of items which define a bad node for us such as
> ypbind not bound, maybe /scr is full, etc. We need to be able to customize
> this list.

> What might be built into torque to achieve this? It would be ideal if the
> node was not only passed by for a job but even offlined with a comment.

yes. you can do this via a node check script.


we use it to determine known problematic conditions
or pre-failure warnings and have the node go offline.


> Thanks,
> William
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

Dr. Axel Kohlmeyer    akohlmey at gmail.com

Institute for Computational Molecular Science
Temple University, Philadelphia PA, USA.

More information about the torqueusers mailing list