A.24 Case Study: Cluster Environment Event Handling
OverviewAn organization requires intelligent management of cluster behavior in the event of various periodic failures including the following:
SolutionMoab's event management features (generic events, generic metrics, and triggers) allow an organization to address each of these events in an intelligent manner which maximize cluster availability and protect the most important workload. For the events above
The configuration below will enable Moab to schedule a weekly accounting package and enable an analysis service during business hours.
SCHEDCFG[master] SERVER=main.ifl.com MODE=NORMAL # interface to monitor/manage services RMCFG[direct] TYPE=Loadleveler # load information regarding UPS, Chiller and Storage Manager RMCFG[local] TYPE=native # enable connection to utility computing resources # only enable if local failures occur RMCFG[uci] TYPE=moab://utilitycomputinginc.com:22000 STATE=disabled # cooling has failed, power down cluster immediately GEVENTCFG[coolfail] action=notify,record,preempt,execute:/tools/powerdown # external power failure detected. powerdown nodes associated with # low priority jobs GEVENTCFG[powerfail] action=notify,record,execute:/tools/powerdown-lpo # UPS is almost empty, shutdown cluster GEVENTCFG[powerfail2] action=notify,record,execute:/tools/powerdown # minimize use of 'hot' nodes GEVENTCFG[hitemp] action=notify,record,avoid # temporarily block jobs which require failing storage resources # while warnings are reported GEVENTCFG[storagefail] action=notify,record,reserve # purge full filesystems GEVENTCFG[fsfull] action=record,execute:/tools/purgefs.pl # investigate/recover nodes with low throughput GEVENTCFG[slownode] action=record,execute:/tools/recovernode.pl # local cluster is unavailable, activate remote resources GEVENTCFG[netfailure] action=notify,record,enable:rm:uci
To submit a batch application request which requires operating system provisioning, use standard batch submission commands.
> msub -l nodes=1,walltime=300,arch=x86,os=suse91 applaunch:data3.txt moab.1043 submitted
Moab will schedule applications across the cluster, grid, or utility computing resource and will package the application with the requested operating system. As needed, Moab will reprovision resources to provide the bundled OS/application on the best available compute node.
With this model, both batch and rigidly scheduled applications can be inter-mixed.
|© 2001-2010 Adaptive Computing Enterprises, Inc.|