MOAB•CON 2008:
Advancing Computing Intelligence
May 27-30, 2008
Provo Marriott Hotel & Conference Center
Provo, UT
Moab-Con was a great success! Thanks to everyone who attended.
Moab•Con 2008 Sessions
Wednesday, May 28, General Sessions
Wednesday, May 28, Breakout Sessions
Thursday, May 29, General Sessions
Thursday, May 29, Breakout Sessions
Friday, May 30, General Sessions
Friday, May 30, Breakout Sessions
Wednesday, May 28
General Sessions
Keynote: Powering Data Pipelines at Yahoo! with Moab and TORQUE
Presented by Kazi A. Zaman, Yahoo!
In this talk, we introduce the concept of data pipelines and outline why they are important from a business perspective at Yahoo!. We cover the technical challenges that need to be met by data pipelines: they need to be highly available, capable of processing terabytes of data per day, be capable of expressing complex data processing workflows and efficiently utilize available hardware. We describe an architecture for data pipelines built on top of Moab and TORQUE and show how it meets these requirements.
Holistic Scheduling—Interfacing to Storage, Monitors, Legacy Systems, etc.
Presented by Trev Harmon, Cluster Resources, and Brent Welch, Panasas
Moab can gather information from a number of different sources, including storage, performance, and hardware monitors. This information is used for scheduling, alerts, and reporting. The Native Resource Manager Interface is the most common way for Moab to interact with different systems. This session will review this interface.
Panasas will discuss their storage solution and its use of this interface. Session will include some of the monitoring features in the Panasas parallel file system, and explain some of the ways it can be integrated into the Moab management infrastructure. The talk will describe features envisioned for future product enhancements in the area of performance monitoring and profile-based job scheduling support.
Tuning for Massive Systems
Presented by Jerry D. Smith II, Sandia National Laboratories
Sandia National Laboratories maintains some of the world's largest supercomputers, including RedStorm (#6) and Thunderbird(#18). We will discuss what steps we take, and what parameter tunings are used at SNL, allowing us to provide high levels of functionality, flexibility, and reliability with Moab and TORQUE, across our massive systems, and into the future.
Improving Availability with Triggers and Autonomics
Presented by David Litster, Cluster Resources
In many environments failures can be detected early and responses can be automated. This session will cover how to use Moab's extensive resource monitoring tools together with Moab's triggers to automate actions based on environmental factors. We will include ways to automate external actions as well as ways to have Moab automatically adjust its own internal policies.
Case Studies (Multiple presenters)
Clustering Made Easy with Scyld ClusterWare and Scyld TaskMaster (Moab):
Presenter: Josh Bernstein, Penguin Computing
Scyld ClusterWare developed by the originator of Beowulf Linux Clustering, Don Becker, provides an easy to use Cluster Management Solution. It provides a single point for cluster installation, administration, security, and monitoring. Scyld ClusterWare, coupled with Scyld TaskMaster (Moab), provides a comprehensive solution for Cluster Management and Scheduling. We will illustrate this by going through a specific Penguin Computing customer example.
Installation Experiences with TORQUE, Maui and Moab
Presenter: J.W. “Pat” O’Bryant and Shane Flaherty, ExxonMobil, Global Services Company
ExxonMobil, Global Services Company experiences with testing and using Cluster Resource products. The installation and configuration of TORQUE, Maui, and Moab will be covered.
The Intersection of HPC and the Data Center—SOA, Dynamic Services and Transaction Management
Presented by Trev Harmon, Cluster Resources
The line between traditional HPC and Data Centers is blurring. Many clusters are now running as a mix between these two approaches. Moab provides a number of tools and techniques for addressing this unique space, including System Jobs, Service Jobs, Job Templates, and workflows. Many of these will be discussed in this session.
Getting Smarter—Utilizing Moab’s Learning Features
Presented by Dave Jackson, CTO, Cluster Resources
Moab currently contains many 'automated learning' facilities and is rapidly adding more in the areas of system failure handling, performance, and optimal scheduling practices. This session will cover existing production and beta capabilities, and discuss areas of cluster, grid and cloud management which can most benefit from automated learning.
Scheduling in the Unified Fabric Era
Presented by Aviv Cohen, Product Management Group Leader, Voltaire
Unified Fabrics, where a single network interface caters to all the server IO and virtual interfaces requirements, has unleashed new opportunities of Application scheduling. The integration between Moab and Voltaire Unified Fabric resource manager enables intelligent application scheduling based on network topology, optimized routing and QoS, as well as increases cluster utilization.
Managing an SLA-Centric Workload
Presented by Josh Butikofer, Cluster Resources
Many HPC sites have found that a straightforward policy of a few queues, priority scheduling, and backfill is not enough to meet the demands of fairness, policies, and politics. These sites have found that scheduling based on service-level agreements (SLA) can help address the more complicated needs of their users and workload. This presentation will explain how SLA-centric workload differs from traditional HPC batch workload, introduce Moab's QOS configuration, and give examples of how Moab can empower administrators to enable SLA-based scheduling.
5:00pm – 5:40pm
PANEL Discussion: What Is My Cluster Doing? Best Practices in Managing the Flood of System Data
Breakout Sessions
Making the Most of Moab Diagnostics
Presented by Douglas Wightman, Cluster Resources
We will cover the various Moab diagnostic commands, their usage, expected outputs, and how they can help administrators quickly track down various issues on their clusters. This discussion will also cover the Moab logging facilities, including the events files.
Moab Cluster Manager Workshop
Presented by Brady Kimball and Nate Seeley, Cluster Resources
We will discuss how to connect the Moab Cluster Manager graphical user interface to Moab Workload Manager, how to view the interaction between Cluster Manager and Workload Manager, and where Cluster Manager changes which would normally affect the moab.cfg are recorded. We will also explain how to configure Moab to allow creation of charts and graphs. We will then give a demonstration on using Moab Cluster Manager to perform common tasks, including job submission, reservation creation, and priority.
Policies for Optimization: Q&A With Moab Workload Manager Developers
Presented by Douglas Wightman, Cluster Resources
Question and answer session with Moab developers focusing on how to optimize Moab for response time, utilization, and throughput based on workload requests and resources available. We will cover tuning Moab for large systems and special workloads.
Policies for Fairness: Q&A With Moab Workload Manager Developers
Presented by Scott Jackson and Douglas Wightman, Cluster Resources
The concept of fairness in Moab has to do with controlling access and utilization of resources according to a deliberate policy plan. This session will give opportunity for participants to ask questions to Moab developers about the policies that exist within Moab to manage the distribution of resources to the entities that need to use them.
High Availability—TORQUE, Moab and General Workload
Presented by Josh Butikofer, Cluster Resources
Constant availability of resources and the need to run workload 24x7 has become a requirement for most clusters and grids. This means that Moab and TORQUE must always be available--even in the case of hardware failure or a software crash. This session will discuss how Moab and TORQUE can be configured to run in a high availability mode to ensure that Moab or TORQUE is always managing a system. Details about how the high availability is implemented, exact configuration examples, and planned future enhancements will also be discussed. Participants will be free to ask questions of the developers who have overseen the implementation of the high availability features.
Q&A with GOLD Developers
Presented by Scott Jackson, Cluster Resources
The Gold Allocation Manager rations compute resources to projects and users. It behaves much like a bank, in which accounts are charged for resource usage. This allows sites to use and enforce an allocation plan for the expenditure of resources. This session will provide the participants an opportunity to raise questions to the Gold developer about use cases, capabilities, problems, future plans, and etc.
Applying Green Computing to Clusters and the Data Center
Presented by Steve Duchene and Andre Kerstens, SGI
Rising electricity costs and environmental concerns are starting to make both the corporate IT and scientific HPC worlds focus more on green computing. Because of this, people are not only thinking about ways to decrease the initial acquisition costs of their equipment, but they are also putting constraints on the operational budgets of that same equipment.
To address this challenge, we will show how to get Moab to use incoming workload, relative operational costs for power and cooling and other factors when making decisions about putting a system to sleep or powering it off. In addition we will have Moab look at system temperatures in an effort to assign incoming workloads to cooler systems. This will serve to balance out temperature hot spots a grid of clusters. Over all, we feel this will help reduce power and cooling loads for those systems which will have a positive effect on the long term operational budget for a production HPC environment.
Automated Cluster Deployment with Moab (multiple presenters)
Cluster Resources, Novell and Clustercorp will highlight automated cluster deployment with Moab. In the first half of the session, Novell will overview SuSE Linux in HPC, then Cluster Resources will present Moab Cluster Builder for SuSE Linux and then do a live install of the solution during the session.
Moab Cluster Builder for SuSE Linux is a single DVD that first installs SuSE Linux Enterprise Server and then deploys TORQUE and Moab and other required HPC tools, auto configures them and runs a validation suite upon conclusion.
Next, Clustercorp will present on Rocks + Moab.
Adaptive High Performance Computing with SUSE Linux and Windows
Presenter: Nathan Conger, Novell
Adaptive High Performance Computing means dynamically allocating mixed Linux and Windows cluster environments to meet changing compute and business requirements. This session will discuss how to maximize mixed SUSE Linux Enterprise and Windows Compute Cluster Server environments by leveraging the Moab Cluster Suite from Cluster Resources.
Automated Cluster Deployment with Rocks+ by Clustercorp and Moab Cluster Builder
Presenter: Tim McIntire, President of Clustercorp, and Michael Jackson, President of Cluster Resources
Tim McIntire from Clustercorp will speak on the the what, why and how of building clusters with Rocks+MOAB. Rocks is a complete cluster distribution built on Red Hat Enterprise Linux (or CentOS), that includes each part of the HPC software stack as modular components (Rolls). This modular infrastructure allows users to deploy certified, standards-based high performance computing clusters with Moab pre-configured (the Moab Roll). Other Rolls, which are added to the system by simply clicking on a check-box, include the Intel Developer Roll, PGI Roll, Absoft Roll, Viz Roll, Bio Roll, and CFD Roll. We will diagram the complete Rocks software stack, walk attendees through the complete end-to-end install process (with slides), and give a brief overview of the Rocks framework, which is the underlying mechanism that enables a simple, yet robust, end-user experience.
Thursday, May 29
General Session
Keynote: The Evolution of Scale-out Computing
Presented by Egan Ford, IBM
'Scaled out' infrastructure–consisting of distributed Linux boxes–has received widespread adoption, but this paradigm has resulted in management complexity associated with initial provisioning and undocumented changes. Intelligent policy-driven dynamic provisioning and stateless servers not only address these issues but also open up a wealth of new possibilities in delivering new solutions and a more flexible and adaptive infrastructure across the spectrum of HPC and data center users.
Adaptive Data Center
Presented by Susanne M. Balle, Hewlett-Packard
HP and Cluster Resources have created a joint solution to pursue commercial enterprise Grid opportunities where automatic adaptation of the resources to the workload is required in a changing cross-enterprise IT environment. This solution demonstrates the value of a completely automated environment composed of capacity management, auto-provisioning, resource flexing, grid-wide monitoring, virtualization, workload scheduling and load balancing of batch and service jobs. This solution allows for maximization of server utilization.
Simulation and Emulation for Performance Prediction
Presented by Baochuan Lu and Wesley Emeneker, University of Arkansas, and Dave Jackson, CTO, Cluster Resources
Cluster use has grown exponentially in recent years. The Integrated Capacity Planning Toolkit (ICPT) has been developed to predict future cluster needs by analyzing and modeling historical system workloads. We look at how the ICPT can be used to predict behavior, and look at how “what-if” scenarios of how new technologies like virtualization can affect system response.
Utility Computing and Hosted Resources
Presented by Trev Harmon, Cluster Resources
Scarce resources is one issue faced daily by many cluster administrators. Instead of buying new hardware, Moab offers several alternative solutions that allow administrators to temporarily access additional resources to handle spikes and the other daily variations and fluctuation seen in workload. In this session, we will be discussing some of the technologies that provide this functionality.
Best Practices in Capacity Planning
Presented by Brady Kimball, Cluster Resources
Because acquiring and setting up new hardware can be a painful process, it is important to understand what can be done to optimize the use of existing resources. We will describe some techniques to use with Moab to report on resource inefficiencies and how to address them. When hardware upgrades are necessary, Moab's scheduling tools can minimize the effect of maintenance on other workload. These tools and practices in Moab can increase a system administrator's ability to isolate and maintain capacity planning issues.
Green Computing—Power and Thermal Optimized Scheduling
Presented by Dan Stanzione, ASU, and Michael Jackson, President of Cluster Resources
Utilizing Moab's advanced scheduling capabilities to schedule jobs based on power consumption, thermal output, and total cluster power capacity. Roughly 40-50% of corporate enerty consumption goes to IT, and computing-center power costs have more than doubled over the last five years. Moab will enable your organization to effectively reduce energy consumption costs as it optimizes IT performance.
Cluster Consolidation and Sovereign Grids
Presented by Jonathan Ryskamp, Cluster Resources
An introduction to how Moab can be used to consolidate clusters and create sovereign grids in the real-world. The session will include a discussion of problems that sites will likely face, how these problems can be overcome, best practices in enabling grid, benefits to cluster consolidation and grid creation, and case studies.
Windows+Linux Dynamic Hybrid Clusters
Presented by Matt Blythe, Microsoft
Because HPC clusters represent a significant investment in capital and operational resources, maximizing the capabilities of your existing infrastructure is critical for increased utilization and overall savings. By have multiple operating systems available on your existing clusters, you gain the flexibility of an additional cluster, or sub-cluster, without having to invest in further hardware. There are a number of scenarios in which the ability to have both the Linux operating system and Windows HPC Server available on your cluster is an advantage, including new application development, performance testing, proof-of-concepts, application migration, and platform test scenarios. This talk will describe the advantages of Linux and Windows HPC Server multi-OS environments, while covering some of capabilities and benefits of Windows HPC Server and its associated ecosystem of development and management tools.
5:00pm – 5:40pm
PANEL Discussion: Cloud Computing—Is it Time or Is it Hype? (Multiple presenters)
Breakout Sessions
Moab Internals
Presented by Douglas Wightman, Cluster Resources
Question and answer session with Moab developers concerning the internals of Moab sheduling. Topics may include managed objects, their life-cycles and interactions, as well as algorithms and resource manager interfaces.
Q&A with TORQUE Developers
Presented by Nick Ihli and Al Taufer, Cluster Resources
Question and answer session covering TORQUE resource manager, with discussion on some of TORQUE’s newest features.
Managing a Real-World Grid—Politics, Resource Heterogeneity, User Issues and Competing Technologies
Presented by Peter Enstrom, NCSA
Computing grids are growing and spanning independent organizations. Complexities arise when grids cross institutional boundaries. This talk will examine some of the issues that need to be addressed when setting up, administrating and using a real world grid.
Q&A with Moab Access Portal Developers
Presented by Noah Carroll, Cluster Resources
Question and answer session covering Moab Access Portal (MAP) with a presentation on customization, installation and basic usage.
Workload Management on Leadership Class Architectures—IBM BlueGene/Cell, Cray XT (Multiple presenters)
The Moab Workload Manager has been adapted to optimize the batch workload for the top leadership class architectures. Architectures such as the Cray XT, the IBM BlueGene and the IBM cell architecture can benefit from innovative scheduling optimizations implemented by Moab. This session will be divided into three 20-minute sections where Cluster Resources developers, customers, and partners will discuss their experiences in customizing the batch system on these architectures.
Presenters: Peter Savinelli, IBM, and Don Lipari, Lawrence Livermore National Laboratory
Presenters: Peter Savinelli, IBM, and Scott Jackson, Cluster Resources
Workload Management on Cray XT platforms at ORNL
Presenter: Don Maxwell, Oak Ridge National Laboratory
The primary mission of the National Center for Computational Sciences at Oak Ridge National Laboratory is open scientific research at large scale. Providing the resources to complete that mission while also maintaining a high utilization can be challenging. Problems resolved by using the MOAB scheduler along with a review of policies to accomplish the mission of NCCS will be presented.
Workload Management On Leadership Class Architectures
Presenter: Michael Karo, Cray
Applying large scale heterogeneous computational resources to address the complex and diverse needs of real world applications is the goal of Cray's Adaptive Supercomputing vision. Management and scheduling of these resources requires a robust and sophisticated software infrastructure. TORQUE and Moab are integral components of this infrastructure, essential to address resource and workload management requirements. In this talk, we will explore the Cray product roadmap and its emphasis on high performance, programmability, portability, and robustness. We will also discuss the the role of TORQUE and Moab in current and future-generation Cray systems.
Q&A with Moab Grid Developers
Presented by Josh Butikofer, Cluster Resources
This session will allow current and prospective grid users to ask specific questions about the current Moab grid offerings. Ideal topics include best practices in creating grids, scalability concerns, help with data staging or other advanced configuration, high availability in grids, special considerations for network and file systems, etc.
Managing Workflows
Presented by Trev Harmon, Cluster Resources
Workflows allow the creation of job flows based on simple or complex DAGs. This session will discuss the creation of these workflows, as well as some of the key underlying technologies.
Friday, May 30
General Sessions
Keynote Address
Presented by Dave Jackson, CTO, Cluster Resources
Advanced TORQUE Administration
Presented by Nick Ihli, Cluster Resources
This session covers various advanced features and capabilities in TORQUE. We will discuss areas such as the recently developed checkpoint/restart integration system with BLCR, job arrays, high throughput, failure handling, advanced diagnostics, and best practices for optimizing your TORQUE system.
Case Studies (Multiple presenters)
Presenter: Oliver Baltzer, Flagstone Re
At Flagstone Re we use Moab as a scheduling component in a number of our core business applications. It is tightly integrated into the existing heterogeneous software architecture consisting of components running on Microsoft Windows servers as well as Linux clusters. Moab's extensibility and flexibility allowed us to develop a custom workflow execution component capable of scheduling complex fine-grained workflows composed of parallel and sequential activities effectively on available resources. Our component integrates directly with the Windows Workflow Foundation technology and enables a seamless integration between the Linux and MS Windows environments. At the same time, Moab provides our applications with advanced resource allocation, QoS and reservation features allowing us to adapt our operations to timely demands.
Presenter: Nicholas P. Cardo, NERSC
Presenter: Jess Arrington, Cluster Resources
Presenting a case study on U of Cambridge's application of Moab's Hybrid Technology
Moab and Virtualization in HPC
Presenter: Dan Stanzione, Arizona StateUniversity
The explosion of cluster computing for business and scientific applications has resulted made it commonplace for multiple independent clusters to exist on a single academic or corporate campus. Typically, each cluster is an autonomous and independent unit that has no interaction with other clusters. Each cluster also represents a significant investment. Virtualization is a promising avenue for combining the resources of these clusters. The Dynamic Virtual Clustering system combines Moab and TORQUE with Xen virtualization to raise cluster utilization through co-allocation and job forwarding, and to allow a single cluster to support multiple "virtual cluster" software environments. This talk will provide an overview of the design of DVC within the Moab/TORQUE framework.
12:15pm – 1:00pm
PANEL Discussion: The Next Generation in Workload Management—What is Most Needed Now and Over the Next 18 Months
Breakout Sessions
Q&A with Moab Cluster Manager Developers: Capacity Planning, Automated Reports, and Visualizing Generic Metrics
Presented by Brady Kimball and Nate Seeley, Cluster Resources
We will show how the Moab Cluster Manager graphical user interface can create charts allowing visualization of generic metrics. Cluster Manager can also show linear regressions of system utilization line graphs to help one spot trends in workload, which aid in capacity planning. We will discuss which of Cluster Manager's charts can be created from the command line and how this might facilitate automated reporting. We will also have a question and answer session covering any topic related to Cluster Manager.
Managing Clusters with Moab+SLURM
Presented by Don Lipari, Group Leader of the Integrated Computational Resource Management Group, Lawrence Livermore National Laboratory (LLNL)
SLURM provides the underlying resource management functions for Moab at LLNL and on about 1/3 of the Top 500 supercomputers. An overview of SLURM's design and capabilities will be presented including a description of how Moab and SLURM interoperate. LLNL's workload management architecture will also be described.
Managing Clusters with Moab+SGE
Presented by Scott Jackson, Cluster Resources
Moab can be used to schedule workload using the Sun Grid Engine (SGE) resource manager. Moab uses its resource manager native interface to make intelligent scheduling decisions on an SGE batch system giving managers; admins and users access to the full power of Moab Workload Management. This adds to the long list of resource managers that can be managed by Moab in a cluster or a grid.
Q&A with Moab Workload Manager Developers—Preemption, High Throughput, and Massive Workloads
Presented by Trev Harmon, David Litster, Scott Jackson, Cluster Resources
This Q&A is geared towards the operation of Moab in systems with extremely large workloads in terms of number of jobs, throughput requirements, and special job policy requirements.


