partners placeholder


MOAB•CON 2008:

Advancing Computing Intelligence

May 27-30, 2008

Provo Marriott Hotel & Conference Center

Provo, UT


Moab-Con was a great success! Thanks to everyone who attended.

Moab•Con 2008 Sessions


Wednesday, May 28, General Sessions

Wednesday, May 28, Breakout Sessions

Thursday, May 29, General Sessions

Thursday, May 29, Breakout Sessions

Friday, May 30, General Sessions

Friday, May 30, Breakout Sessions

                                                  

Wednesday, May 28


General Sessions


8:00am – 8:55am

Keynote: Powering Data Pipelines at Yahoo! with Moab and TORQUE

Presented by Kazi A. Zaman, Yahoo!

In this talk, we introduce the concept of data pipelines and outline why they are important from a business perspective at Yahoo!. We cover the technical challenges that need to be met by data pipelines: they need to be highly available, capable of processing terabytes of data per day, be capable of expressing complex data processing workflows and efficiently utilize available hardware. We describe an architecture for data pipelines built on top of Moab and TORQUE and show how it meets these requirements.


9:00am – 9:55am

Holistic Scheduling—Interfacing to Storage, Monitors, Legacy Systems, etc.

Presented by Trev Harmon, Cluster Resources, and Brent Welch, Panasas

Moab can gather information from a number of different sources, including storage, performance, and hardware monitors. This information is used for scheduling, alerts, and reporting. The Native Resource Manager Interface is the most common way for Moab to interact with different systems. This session will review this interface.

Panasas will discuss their storage solution and its use of this interface. Session will include some of the monitoring features in the Panasas parallel file system, and explain some of the ways it can be integrated into the Moab management infrastructure. The talk will describe features envisioned for future product enhancements in the area of performance monitoring and profile-based job scheduling support.


10:00am – 11:00am

Tuning for Massive Systems

Presented by Jerry D. Smith II, Sandia National Laboratories

Sandia National Laboratories maintains some of the world's largest supercomputers, including RedStorm (#6) and Thunderbird(#18). We will discuss what steps we take, and what parameter tunings are used at SNL, allowing us to provide high levels of functionality, flexibility, and reliability with Moab and TORQUE, across our massive systems, and into the future.


11:15am – 12:15pm

Improving Availability with Triggers and Autonomics

Presented by David Litster, Cluster Resources

In many environments failures can be detected early and responses can be automated. This session will cover how to use Moab's extensive resource monitoring tools together with Moab's triggers to automate actions based on environmental factors. We will include ways to automate external actions as well as ways to have Moab automatically adjust its own internal policies.


1:15pm – 2:10pm

Case Studies (Multiple presenters)

Clustering Made Easy with Scyld ClusterWare and Scyld TaskMaster (Moab):

Presenter:  Josh Bernstein, Penguin Computing

Scyld ClusterWare developed by the originator of Beowulf Linux Clustering, Don Becker, provides an easy to use Cluster Management Solution. It provides a single point for cluster installation, administration, security, and monitoring. Scyld ClusterWare, coupled with Scyld TaskMaster (Moab), provides a comprehensive solution for Cluster Management and Scheduling. We will illustrate this by going through a specific Penguin Computing customer example.

Installation Experiences with TORQUE, Maui and Moab

Presenter: J.W. “Pat” O’Bryant and Shane Flaherty, ExxonMobil, Global Services Company

ExxonMobil, Global Services Company experiences with testing and using Cluster Resource products. The installation and configuration of TORQUE, Maui, and Moab will be covered.


2:15pm – 2:45pm

The Intersection of HPC and the Data Center—SOA, Dynamic Services and Transaction Management

Presented by Trev Harmon, Cluster Resources

The line between traditional HPC and Data Centers is blurring. Many clusters are now running as a mix between these two approaches. Moab provides a number of tools and techniques for addressing this unique space, including System Jobs, Service Jobs, Job Templates, and workflows. Many of these will be discussed in this session.


2:45pm – 3:15pm

Getting Smarter—Utilizing Moab’s Learning Features

Presented by Dave Jackson, CTO, Cluster Resources

Moab currently contains many 'automated learning' facilities and is rapidly adding more in the areas of system failure handling, performance, and optimal scheduling practices. This session will cover existing production and beta capabilities, and discuss areas of cluster, grid and cloud management which can most benefit from automated learning.


3:30pm – 4:10pm

Scheduling in the Unified Fabric Era

Presented by Aviv Cohen, Product Management Group Leader, Voltaire

Unified Fabrics, where a single network interface caters to all the server IO and virtual interfaces requirements, has unleashed new opportunities of Application scheduling. The integration between Moab and Voltaire Unified Fabric resource manager enables intelligent application scheduling based on network topology, optimized routing and QoS, as well as increases cluster utilization.


4:15pm – 4:55pm

Managing an SLA-Centric Workload

Presented by Josh Butikofer, Cluster Resources

Many HPC sites have found that a straightforward policy of a few queues, priority scheduling, and backfill is not enough to meet the demands of fairness, policies, and politics. These sites have found that scheduling based on service-level agreements (SLA) can help address the more complicated needs of their users and workload. This presentation will explain how SLA-centric workload differs from traditional HPC batch workload, introduce Moab's QOS configuration, and give examples of how Moab can empower administrators to enable SLA-based scheduling.


5:00pm – 5:40pm

PANEL Discussion: What Is My Cluster Doing? Best Practices in Managing the Flood of System Data



Breakout Sessions


9:00am – 9:30am

Making the Most of Moab Diagnostics

Presented by Douglas Wightman, Cluster Resources

We will cover the various Moab diagnostic commands, their usage, expected outputs, and how they can help administrators quickly track down various issues on their clusters. This discussion will also cover the Moab logging facilities, including the events files.


10:00am – 11:00am

Moab Cluster Manager Workshop

Presented by Brady Kimball and Nate Seeley, Cluster Resources

We will discuss how to connect the Moab Cluster Manager graphical user interface to Moab Workload Manager, how to view the interaction between Cluster Manager and Workload Manager, and where Cluster Manager changes which would normally affect the moab.cfg are recorded. We will also explain how to configure Moab to allow creation of charts and graphs. We will then give a demonstration on using Moab Cluster Manager to perform common tasks, including job submission, reservation creation, and priority.


11:15am – 12:15pm

Policies for Optimization: Q&A With Moab Workload Manager Developers

Presented by Douglas Wightman, Cluster Resources

Question and answer session with Moab developers focusing on how to optimize Moab for response time, utilization, and throughput based on workload requests and resources available. We will cover tuning Moab for large systems and special workloads.


1:15pm – 2:10pm

Policies for Fairness: Q&A With Moab Workload Manager Developers

Presented by Scott Jackson and Douglas Wightman, Cluster Resources

The concept of fairness in Moab has to do with controlling access and utilization of resources according to a deliberate policy plan. This session will give opportunity for participants to ask questions to Moab developers about the policies that exist within Moab to manage the distribution of resources to the entities that need to use them.


2:15pm – 2:45pm

High Availability—TORQUE, Moab and General Workload

Presented by Josh Butikofer, Cluster Resources

Constant availability of resources and the need to run workload 24x7 has become a requirement for most clusters and grids. This means that Moab and TORQUE must always be available--even in the case of hardware failure or a software crash. This session will discuss how Moab and TORQUE can be configured to run in a high availability mode to ensure that Moab or TORQUE is always managing a system. Details about how the high availability is implemented, exact configuration examples, and planned future enhancements will also be discussed. Participants will be free to ask questions of the developers who have overseen the implementation of the high availability features.


2:50pm – 3:15pm

Q&A with GOLD Developers

Presented by Scott Jackson, Cluster Resources

The Gold Allocation Manager rations compute resources to projects and users. It behaves much like a bank, in which accounts are charged for resource usage. This allows sites to use and enforce an allocation plan for the expenditure of resources. This session will provide the participants an opportunity to raise questions to the Gold developer about use cases, capabilities, problems, future plans, and etc.


3:30pm – 4:10pm

Applying Green Computing to Clusters and the Data Center

Presented by Steve Duchene and Andre Kerstens, SGI

Rising electricity costs and environmental concerns are starting to make both the corporate IT and scientific HPC worlds focus more on green computing. Because of this, people are not only thinking about ways to decrease the initial acquisition costs of their equipment, but they are also putting constraints on the operational budgets of that same equipment.

To address this challenge, we will show how to get Moab to use incoming workload, relative operational costs for power and cooling and other factors when making decisions about putting a system to sleep or powering it off. In addition we will have Moab look at system temperatures in an effort to assign incoming workloads to cooler systems. This will serve to balance out temperature hot spots a grid of clusters. Over all, we feel this will help reduce power and cooling loads for those systems which will have a positive effect on the long term operational budget for a production HPC environment.


4:15pm – 4:55pm

Automated Cluster Deployment with Moab (multiple presenters)

Cluster Resources, Novell and Clustercorp will highlight automated cluster deployment with Moab. In the first half of the session, Novell will overview SuSE Linux in HPC, then Cluster Resources will present Moab Cluster Builder for SuSE Linux and then do a live install of the solution during the session.

Moab Cluster Builder for SuSE Linux is a single DVD that first installs SuSE Linux Enterprise Server and then deploys TORQUE and Moab and other required HPC tools, auto configures them and runs a validation suite upon conclusion.

Next, Clustercorp will present on Rocks + Moab.

Adaptive High Performance Computing with SUSE Linux and Windows

Presenter: Nathan Conger, Novell

Adaptive High Performance Computing means dynamically allocating mixed Linux and Windows cluster environments to meet changing compute and business requirements. This session will discuss how to maximize mixed SUSE Linux Enterprise and Windows Compute Cluster Server environments by leveraging the Moab Cluster Suite from Cluster Resources.

Automated Cluster Deployment with Rocks+ by Clustercorp and Moab Cluster Builder

Presenter: Tim McIntire, President of Clustercorp, and Michael Jackson, President of Cluster Resources

Tim McIntire from Clustercorp will speak on the the what, why and how of building clusters with Rocks+MOAB. Rocks is a complete cluster distribution built on Red Hat Enterprise Linux (or CentOS), that includes each part of the HPC software stack as modular components (Rolls). This modular infrastructure allows users to deploy certified, standards-based high performance computing clusters with Moab pre-configured (the Moab Roll). Other Rolls, which are added to the system by simply clicking on a check-box, include the Intel Developer Roll, PGI Roll, Absoft Roll, Viz Roll, Bio Roll, and CFD Roll. We will diagram the complete Rocks software stack, walk attendees through the complete end-to-end install process (with slides), and give a brief overview of the Rocks framework, which is the underlying mechanism that enables a simple, yet robust, end-user experience.



Thursday, May 29


General Session


8:00am – 8:55am

Keynote: The Evolution of Scale-out Computing

Presented by Egan Ford, IBM

'Scaled out' infrastructure–consisting of distributed Linux boxes–has received widespread adoption, but this paradigm has resulted in management complexity associated with initial provisioning and undocumented changes. Intelligent policy-driven dynamic provisioning and stateless servers not only address these issues but also open up a wealth of new possibilities in delivering new solutions and a more flexible and adaptive infrastructure across the spectrum of HPC and data center users.


9:00am – 9:55am

Adaptive Data Center

Presented by Susanne M. Balle, Hewlett-Packard

HP and Cluster Resources have created a joint solution to pursue commercial enterprise Grid opportunities where automatic adaptation of the resources to the workload is required in a changing cross-enterprise IT environment. This solution demonstrates the value of a completely automated environment composed of capacity management, auto-provisioning, resource flexing, grid-wide monitoring, virtualization, workload scheduling and load balancing of batch and service jobs. This solution allows for maximization of server utilization.

10:00am – 11:00am

Simulation and Emulation for Performance Prediction

Presented by Baochuan Lu and Wesley Emeneker, University of Arkansas, and Dave Jackson, CTO, Cluster Resources

Cluster use has grown exponentially in recent years. The Integrated Capacity Planning Toolkit (ICPT) has been developed to predict future cluster needs by analyzing and modeling historical system workloads. We look at how the ICPT can be used to predict behavior, and look at how “what-if” scenarios of how new technologies like virtualization can affect system response.


11:15am – 12:15pm

Utility Computing and Hosted Resources

Presented by Trev Harmon, Cluster Resources

Scarce resources is one issue faced daily by many cluster administrators. Instead of buying new hardware, Moab offers several alternative solutions that allow administrators to temporarily access additional resources to handle spikes and the other daily variations and fluctuation seen in workload. In this session, we will be discussing some of the technologies that provide this functionality.


1:15pm – 2:10pm

Best Practices in Capacity Planning

Presented by Brady Kimball, Cluster Resources

Because acquiring and setting up new hardware can be a painful process, it is important to understand what can be done to optimize the use of existing resources. We will describe some techniques to use with Moab to report on resource inefficiencies and how to address them. When hardware upgrades are necessary, Moab's scheduling tools can minimize the effect of maintenance on other workload. These tools and practices in Moab can increase a system administrator's ability to isolate and maintain capacity planning issues.


2:15pm – 3:15pm

Green Computing—Power and Thermal Optimized Scheduling

Presented by Dan Stanzione, ASU, and Michael Jackson, President of Cluster Resources

Utilizing Moab's advanced scheduling capabilities to schedule jobs based on power consumption, thermal output, and total cluster power capacity. Roughly 40-50% of corporate enerty consumption goes to IT, and computing-center power costs have more than doubled over the last five years. Moab will enable your organization to effectively reduce energy consumption costs as it optimizes IT performance.


3:30pm – 4:10pm

Cluster Consolidation and Sovereign Grids

Presented by Jonathan Ryskamp, Cluster Resources

An introduction to how Moab can be used to consolidate clusters and create sovereign grids in the real-world. The session will include a discussion of problems that sites will likely face, how these problems can be overcome, best practices in enabling grid, benefits to cluster consolidation and grid creation, and case studies.


4:15pm – 4:55pm

Windows+Linux Dynamic Hybrid Clusters

Presented by Matt Blythe, Microsoft

Because HPC clusters represent a significant investment in capital and operational resources, maximizing the capabilities of your existing infrastructure is critical for increased utilization and overall savings. By have multiple operating systems available on your existing clusters, you gain the flexibility of an additional cluster, or sub-cluster, without having to invest in further hardware. There are a number of scenarios in which the ability to have both the Linux operating system and Windows HPC Server available on your cluster is an advantage, including new application development, performance testing, proof-of-concepts, application migration, and platform test scenarios. This talk will describe the advantages of Linux and Windows HPC Server multi-OS environments, while covering some of capabilities and benefits of Windows HPC Server and its associated ecosystem of development and management tools.

5:00pm – 5:40pm

PANEL Discussion: Cloud Computing—Is it Time or Is it Hype? (Multiple presenters)



Breakout Sessions


9:00am – 9:55am

Moab Internals

Presented by Douglas Wightman, Cluster Resources

Question and answer session with Moab developers concerning the internals of Moab sheduling. Topics may include managed objects, their life-cycles and interactions, as well as algorithms and resource manager interfaces.


10:00am – 11:00am

Q&A with TORQUE Developers

Presented by Nick Ihli and Al Taufer, Cluster Resources

Question and answer session covering TORQUE resource manager, with discussion on some of TORQUE’s newest features.


11:15am – 12:15pm

Managing a Real-World Grid—Politics, Resource Heterogeneity, User Issues and Competing Technologies

Presented by Peter Enstrom, NCSA

Computing grids are growing and spanning independent organizations. Complexities arise when grids cross institutional boundaries. This talk will examine some of the issues that need to be addressed when setting up, administrating and using a real world grid.


1:15pm – 2:10pm

Q&A with Moab Access Portal Developers

Presented by Noah Carroll, Cluster Resources

Question and answer session covering Moab Access Portal (MAP) with a presentation on customization, installation and basic usage.


2:15pm – 2:15pm

Workload Management on Leadership Class Architectures—IBM BlueGene/Cell, Cray XT (Multiple presenters)

The Moab Workload Manager has been adapted to optimize the batch workload for the top leadership class architectures. Architectures such as the Cray XT, the IBM BlueGene and the IBM cell architecture can benefit from innovative scheduling optimizations implemented by Moab. This session will be divided into three 20-minute sections where Cluster Resources developers, customers, and partners will discuss their experiences in customizing the batch system on these architectures.

Presenters: Peter Savinelli, IBM, and Don Lipari, Lawrence Livermore National Laboratory

Presenters: Peter Savinelli, IBM, and Scott Jackson, Cluster Resources

Workload Management on Cray XT platforms at ORNL

Presenter: Don Maxwell, Oak Ridge National Laboratory

The primary mission of the National Center for Computational Sciences at Oak Ridge National Laboratory is open scientific research at large scale. Providing the resources to complete that mission while also maintaining a high utilization can be challenging. Problems resolved by using the MOAB scheduler along with a review of policies to accomplish the mission of NCCS will be presented.

Workload Management On Leadership Class Architectures

Presenter: Michael Karo, Cray

Applying large scale heterogeneous computational resources to address the complex and diverse needs of real world applications is the goal of Cray's Adaptive Supercomputing vision. Management and scheduling of these resources requires a robust and sophisticated software infrastructure. TORQUE and Moab are integral components of this infrastructure, essential to address resource and workload management requirements. In this talk, we will explore the Cray product roadmap and its emphasis on high performance, programmability, portability, and robustness. We will also discuss the the role of TORQUE and Moab in current and future-generation Cray systems.


3:30pm – 4:10pm

Q&A with Moab Grid Developers

Presented by Josh Butikofer, Cluster Resources

This session will allow current and prospective grid users to ask specific questions about the current Moab grid offerings. Ideal topics include best practices in creating grids, scalability concerns, help with data staging or other advanced configuration, high availability in grids, special considerations for network and file systems, etc.


4:15pm – 4:55pm

Managing Workflows

Presented by Trev Harmon, Cluster Resources

Workflows allow the creation of job flows based on simple or complex DAGs. This session will discuss the creation of these workflows, as well as some of the key underlying technologies.



Friday, May 30


General Sessions


8:00am – 8:55am

Keynote Address

Presented by Dave Jackson, CTO, Cluster Resources


9:00am – 9:55am

Advanced TORQUE Administration

Presented by Nick Ihli, Cluster Resources

This session covers various advanced features and capabilities in TORQUE. We will discuss areas such as the recently developed checkpoint/restart integration system with BLCR, job arrays, high throughput, failure handling, advanced diagnostics, and best practices for optimizing your TORQUE system.


10:00am – 11:00am

Case Studies (Multiple presenters)

Presenter: Oliver Baltzer, Flagstone Re

At Flagstone Re we use Moab as a scheduling component in a number of our core business applications. It is tightly integrated into the existing heterogeneous software architecture consisting of components running on Microsoft Windows servers as well as Linux clusters. Moab's extensibility and flexibility allowed us to develop a custom workflow execution component capable of scheduling complex fine-grained workflows composed of parallel and sequential activities effectively on available resources. Our component integrates directly with the Windows Workflow Foundation technology and enables a seamless integration between the Linux and MS Windows environments. At the same time, Moab provides our applications with advanced resource allocation, QoS and reservation features allowing us to adapt our operations to timely demands.

Presenter: Nicholas P. Cardo, NERSC

Presenter: Jess Arrington, Cluster Resources

Presenting a case study on U of Cambridge's application of Moab's Hybrid Technology


11:15am – 12:10pm

Moab and Virtualization in HPC

Presenter: Dan Stanzione, Arizona StateUniversity

The explosion of cluster computing for business and scientific applications has resulted made it commonplace for multiple independent clusters to exist on a single academic or corporate campus. Typically, each cluster is an autonomous and independent unit that has no interaction with other clusters. Each cluster also represents a significant investment. Virtualization is a promising avenue for combining the resources of these clusters. The Dynamic Virtual Clustering system combines Moab and TORQUE with Xen virtualization to raise cluster utilization through co-allocation and job forwarding, and to allow a single cluster to support multiple "virtual cluster" software environments. This talk will provide an overview of the design of DVC within the Moab/TORQUE framework.


12:15pm – 1:00pm

PANEL Discussion: The Next Generation in Workload Management—What is Most Needed Now and Over the Next 18 Months



Breakout Sessions


9:00am – 9:55am

Q&A with Moab Cluster Manager Developers: Capacity Planning, Automated Reports, and Visualizing Generic Metrics

Presented by Brady Kimball and Nate Seeley, Cluster Resources

We will show how the Moab Cluster Manager graphical user interface can create charts allowing visualization of generic metrics. Cluster Manager can also show linear regressions of system utilization line graphs to help one spot trends in workload, which aid in capacity planning. We will discuss which of Cluster Manager's charts can be created from the command line and how this might facilitate automated reporting. We will also have a question and answer session covering any topic related to Cluster Manager.


10:00am – 10:30am

Managing Clusters with Moab+SLURM

Presented by Don Lipari, Group Leader of the Integrated Computational Resource Management Group, Lawrence Livermore National Laboratory (LLNL)

SLURM provides the underlying resource management functions for Moab at LLNL and on about 1/3 of the Top 500 supercomputers. An overview of SLURM's design and capabilities will be presented including a description of how Moab and SLURM interoperate. LLNL's workload management architecture will also be described.


10:35am – 11:00am

Managing Clusters with Moab+SGE

Presented by Scott Jackson, Cluster Resources

Moab can be used to schedule workload using the Sun Grid Engine (SGE) resource manager. Moab uses its resource manager native interface to make intelligent scheduling decisions on an SGE batch system giving managers; admins and users access to the full power of Moab Workload Management. This adds to the long list of resource managers that can be managed by Moab in a cluster or a grid.


11:15am – 12:10pm

Q&A with Moab Workload Manager Developers—Preemption, High Throughput, and Massive Workloads

Presented by Trev Harmon, David Litster, Scott Jackson, Cluster Resources

This Q&A is geared towards the operation of Moab in systems with extremely large workloads in terms of number of jobs, throughput requirements, and special job policy requirements.