[torqueusers] reducing energy usage of torque

Lawrence Lowe L.S.Lowe at bham.ac.uk
Tue Feb 28 08:53:47 MST 2012


Hi, we do something similar here on our local cluster, where demand can be 
peaky. (On our Grid cluster, we don't bother to do this as there is 
constant demand. On our Uni cluster, it has network-controlled PDUs, and 
Moab is used and that already comes with some green computing support).

The worker nodes have wake-up-on-LAN enabled as a default. If a monitoring 
script on a master node detects that there aren't enough free worker nodes 
to do the work, then it wakes-up a down non-offline node using "wol"; a 
bit of code ensures that it won't try to wake up the same node for a 
while. If a worker node self-detects in a cron script that it hasn't run 
any jobs for a while by looking at directory time-stamps, then it shuts 
down. Nothing complicated, blatantly asymmetric, and it seems to work in 
our admittedly simple environment.

When I did this I thought it would be a nice feature (torque developers!) 
if, instead of a plain poweroff, I was able to use a "momctl -S" which 
conditionally shutdown the mom and cleanly informed the pbs_server to mark 
it down, if it was empty of jobs, but returned an error code if the mom 
[now] had jobs. I guess the same effect could be programmed for a specific 
new "terminate if empty" signal handler in pbs_mom. There are probably 
other ways of avoiding race conditions, using pbsnodes -o, or signalling 
the scheduler not to schedule jobs while decisions are made, but that's 
more complicated.

LL

On Tue, 28 Feb 2012, Dr. Stephan Raub wrote:

> 
> Hi,
> 
>  
> 
> all of our nodes (compute nodes and service nodes) are equipped with IPMI-capable BMCs
> (Baseboard Management Controller) so that we can control all aspects of power (including
> measuring the current power consumption, turning it on or off, power cycles, etc…) from the
> batch server just by using the ipmitools-package.
> 
>  
> 
> We have used this for controlling nodes within a torque/maui in the context of a bachelor
> project. But our clusters is so busy all the time, that we could not find a dramatic
> reduction of the over-all power consumption of the cluster (including water cooling of the
> racks).
> 
>  
> 
> Best regards
> 
> --
> 
> ---------------------------------------------------------
> 
> | | Dr. rer. nat. Stephan Raub
> 
> | | Dipl. Chem.
> 
> | | High-Performance-Computing
> 
> | | Zentrum für Informations- und Medientechnologie
> 
> | | Heinrich-Heine-Universität Düsseldorf
> 
> | | Universitätsstr. 1 / Raum 25.41.O2.25-2
> 
> | | 40225 Düsseldorf / Germany
> 
> | |
> 
> | | Tel: +49-211-811-3911
> 
> | | Fax: +49-211-811-2539
> 
> ---------------------------------------------------------
> 
>  
> 
> Wichtiger Hinweis: Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse, bzw.
> 
> sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten
> haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine Vervielfältigung oder Weitergabe der
> E-Mail ausdrücklich untersagt. Bitte benachrichtigen Sie uns und vernichten Sie die
> empfangene E-Mail. Vielen Dank.
> 
>  
> 
> Important Note: This e-mail may contain trade secrets or privileged, undisclosed or
> otherwise confidential information. If you have received this e-mail in error, you are
> hereby notified that any review, copying or distribution of it is strictly prohibited.
> Please inform us immediately and destroy the original transmittal. Thank you for your
> cooperation.
> 
>  
> 
> Von: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] Im
> Auftrag von Ryan Golhar
> Gesendet: Dienstag, 28. Februar 2012 15:08
> An: Torque Users Mailing List
> Betreff: Re: [torqueusers] reducing energy usage of torque
> 
>  
> 
> What about cycling the power using a PDU?
> 
> On Tue, Feb 28, 2012 at 2:43 AM, Daniel Fernando Coimbra <danielfcoimbra at gmail.com> wrote:
> 
> I assume that by "turning off" you mean actually power down the node. I
> am just curious on how do you intend to power it up again later.  I
> suppose you could use something like WakeUp on Lan, but I never actually
> got to test this kind of thing and don't know how it would behave on a
> high traffic network (I suppose the network card doesn't keep it's IP
> once it's in such state).
> 
> 
> On 02/26/2012 08:24 PM, Arka Aloke Bhattacharya wrote:
> > Hi everyone,
> >
> > I am a PhD student at UC Berkeley, and I wanted to add a "turning off
> > idle/underutilized servers" feature to our 100 server torque+maui
> > deployment. However, I want to implement this feature using only
> 
> > existing torque+ maui interfaces and extensions ( i,e _without
> > modifying_ the torque or maui source code in any way ).
> 
> >
> > My proposed way is to
> > 1. monitor the maui queue length , and estimate the number of servers
> > I can switch off.
> > 2. I would then use "pbsnodes -o <nodename>" command to render a
> > certain number of servers offline for scheduling.
> > 3. A bash script would turn the servers off.
> >
> > The servers would be turned back on (and added to the torque nodes
> > list) when the queue length increases beyond a certain threshold.
> >
> > I had two questions :
> >
> > 1. Is there any existing open source code which already implements the
> > "turning off idle servers" functionality in torque ?
> > 2. Are there complications that would arise if I implemented the
> > "turning-off idle servers" feature in my proposed way ? [ e.g - Is it
> > possible that after turning off servers, they would lose some state
> > and hence would not get added to the torque <nodes_list> when turned
> > back on? Are there long lived TCP connections which need to be
> > restarted separately ? , etc ]
> >
> > It would be great if anyone could help.
> >
> > Thanks a lot,
> > Arka.
> >
> >
> 
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 
>  
> 
> 
>


More information about the torqueusers mailing list