[torqueusers] strategies for bad nodes
akohlmey at cmm.chem.upenn.edu
Tue Apr 17 07:25:16 MDT 2012
On Tue, Apr 17, 2012 at 9:02 AM, Edsall, William (WJ) <WJEdsall at dow.com> wrote:
> Hello list,
> I’m looking for ideas on how to prevent jobs from going to ‘bad’ nodes.
> There are a small handful of items which define a bad node for us such as
> ypbind not bound, maybe /scr is full, etc. We need to be able to customize
> this list.
> What might be built into torque to achieve this? It would be ideal if the
> node was not only passed by for a job but even offlined with a comment.
yes. you can do this via a node check script.
we use it to determine known problematic conditions
or pre-failure warnings and have the node go offline.
> torqueusers mailing list
> torqueusers at supercluster.org
Dr. Axel Kohlmeyer akohlmey at gmail.com
Institute for Computational Molecular Science
Temple University, Philadelphia PA, USA.
More information about the torqueusers