Case Study 16: Realtime Broadcasting
A.16 Case Study: Realtime Broadcasting
Overview
A broadcasting organization requires the ability to create and broadcast both current status and short-term predications every 15 minutes. This broadcast requires computations for many regions scattered across hundreds of compute nodes and based on regularly updated information. All failures must be gracefully handled and computations must complete even in the event of major resource failures.
Solutions
The best solution entails multiple technologies to guarantee workflow and intelligent manage available resources in the event of failures. First off, Moab's high availability mode will be enabled to eliminate any single point of failure. Next, Moab's multi-resource manager feature will be utilized to provide additional resource monitoring capabilities over the compute nodes, network, storage system, and key applications. Moab will automatically steer workload around failures it detects. Further, Moab triggers and events will enable Moab to notify admins and automatically recover from many node failures.
In the event of reduced capacity, Moab will automatically determine and eliminate invalidated workload and will reprioritize existing jobs based on time since most recent update to provide the most complete possible broadcasts. Moab's application performance learning capabilities will be applied to identify, avoid, and send notifications regarding per node or cluster wide brown-out conditions which result in some nodes performing poorly. Moab's peer-to-peer facilities will also be activated to utilize co-location resources in the event of catastrophic failures.
To automate the process, Moab's standing trigger facility is used to periodically create and submit new workload based on updated external data. This facility allows internal cluster environment to be used in deciding when events occur and provides context information to the actions themselves allowing more intelligent and complete actions.
|