[torqueusers] Modifying Torque to allow nodes to be turned off
Joshua Bernstein
jbernstein at penguincomputing.com
Thu Apr 3 17:06:25 MDT 2008
Hey Jeff,
Good to catch you on the list instead of on the phone ;-)
> Good afternoon,
>
> Myself, as well as many others, have been thinking about how to
> modify job schedulers to allow nodes to be turned off when they
> haven't been used for a while but still have them as available
> resources. Since I'm more familiar with PBS than anything else,
> I thought I would run this by the list to get some reaction and
> perhaps some help.
>...
> So with that said, does this look to be a fairly easy mod that can be
> made to torque? Do you think it's something that should be done?
This is actually something I've been playing with already. In fact at
SuperComputing this past November I was demoing something that shut down
the nodes when they weren't in use.
The trick of course is to still allow jobs to be accepted by the
scheduler. If a node is otherwise marked down, then a job wouldn't be
accepted. What I did as a quick hack was to user the qmgr variable to
tell TORQUE that I had more processors then pbs_mom reported, and
therefore allowing jobs to be accepted even when the nodes were down.
This is done via:
set server resources_available.nodect = ???
set queue batch resources_available.nodect = ???
I've written a secondary daemon that monitors the TORQUE queue and
powers up and down nodes based on thresholds set by the user. The demo
also included the nodes being connected to a monitored PDU so we could
show a power savings versus workload. If you'd like more information or
more detail about any of this, I'd be happy to share.
-Joshua Bernstein
Software Engineer
Penguin Computing
More information about the torqueusers
mailing list