Case Study 19: Mixed Legacy and Massive Scalability Grid
A.19 Case Study: Mixed Legacy and Massive Scalability Grid
Overview
A major government research organization has been involved in high performance computing for over 20 years and now has a significant investment in customized user job scripts, legacy clusters, home-grown accounting, resource management, and metascheduling tools, and other aspects of high performance infrastructure. They would like to make this change together with some partner organizations and establish a multi-organizational grid which spans most of their respective clusters. They currently have nearly 100 clusters of varying architectures which range in size from 50 to over 10,000 processors.
The solution must be able to support all existing commercial and locally developed resource managers, provide a superset of all existing scheduling policies, provide complete accounting, and be locally extensible. Further, it must scale to over 100,000 processors, provide tight security integration, and offer 'masquerading' to allow existing job scripts to be used on other platforms with little or no modification.
Solution
Moab provides highly generalized interfaces allowing it to integrate with legacy accounting and resource management tools using scripts, web services, databases or other mechanisms. It also offers an industry-leading array of cluster management and optimization policies allowing it to address most scheduling needs out of the box. For cases where is does not, Moab provides a policy plug-in interface offering support for locally developed or site specific policies to control job prioritization, node allocation, workload analysis, and most other aspects of cluster scheduling.
Moab's grid services can operate in the following models:
- full management - Moab is responsible for all grid scheduling decisions and actions
- partial management - Moab enhances the capabilities of an existing legacy system
- monitor only - Moab acts as an information service to allow other aspects of the grid to be more intelligent
With these models, Moab can operate completely standalone or support existing legacy systems.
Moab's peer-to-peer grid facilities can also be used to support massive scale clusters by partitioning the clusters into manageable components and re-integrating them virtually using Moab. This allows scalability limitations in storage systems, networks, security systems and other aspects of cluster hardware and software to be effectively bypassed. Using virtual nodes, peer services, and information compression techniques, Moab can intelligently schedule within and across these partitions.
Moab's translation capability allows existing batch scripts to be utilized on new platforms and allows users to continue to operate in the model with which they are most familiar. This preserves a significant investment on the part of the organization.
|