Case Study 6
A.21 Case Study: Collaboration & Economic Development Grids
Overview
A consortium of government, commercial and academic
organizations partner to form a shared collaboration and economic development
grid. A resulting organization known as the Center for Development of Advanced
Computing of India states the following vision: "To emerge as the premier R&D
Institution for the design, development and deployment of world class IT
solutions for economic and human advancement." This particular organization
states that this will result in India's largest grid in terms of computational
power and availability. (See also the Cluster Ohio Project - 23 participating
organizations across Ohio)
Resources
The organization has eight initial clusters that will be made
available in the grid, then this will grow by almost 10 fold in the next few
years. Resources will initially be accessed from 17 cities and approximately
40 to 60 organization. Operating systems vary from AIX and Solaris to various
Linux distributions. Resource managers span the range of commercial products
such as LoadLeveler to open source tools such as TORQUE and OpenPBS. Similarly
hardware characteristics are highly heterogeneous from cluster to cluster.
Workload
As the collaboration and economic development grid has constantly
evolving relationships with new consuming and hosting organizations, the
workload is very unpredictable in terms of size, duration, topic, purpose and
priority. Workload dependencies and optimizations will require a fine degree
of intelligence, self learning and tuneability.
Solution
Moab's Grid Suite allows the organization to unify a global view of the
separate resources for planning and management purposes. Further, Moab's broad
heterogeneity provides an important foundation that allows participating
partners to innovate in the area of their own systems to meet their own needs,
without having to agree upon and unify resource managers, networks,
architectures or other such aspects. The Web-based Moab Access Portal for
Grids can be used for a unified submission method, while simultaneously local
experienced users can continue to use resource manager commands on their own
cluster which they have invested their time learning. Some shared rules can be
establish for the entire system, while maintaining additional sovereign rules
for organizations that seek to guide the use of the resources they purchased.
Moab is able to dynamically adjust to the changing workload and apply
optimization intelligence effectively in this highly complex environment.
Connection into the collaboration/economic grid using Moab does not
limit a participating organization's ability to form collaborative
relationships with other organizations that do not participate in the original
grid. Using Moab, the individual site can create an unlimited number of
associated grid relationships with other individual sites or with other grids.
Ultimately Moab has a nearly boundless set of relationships and rule sets that
it can apply to allow the organizations to make their own political and
partnership decisions and the technology is able to match to their desired
relationship. Grids are not open doors to all resources with Moab, rather
using Moab allows organizations to put specific limitations on what is used, by
whom, at what time and under which conditions. An individual department in an
organization can have a relationship with another department of another
organization while the parent organizations at a higher level do not.
Different rules can apply to each grid relationship allowing for a custom
association that ensures all of the security, resource availability, local
prioritization, network consideration, timing and other concerns are fully met
as well as optimized. Moab allows for cluster to grid relationships, grid to
grid relationships, grid within grid relationships and many other relationship
combinations.
Establish a peer-to-peer grid across all internal clusters allowing automatic load-balancing across active clusters. Enable per lab submission points which are able to migrate workload to local, partner, or commercial resources. By default, allow only priority or urgent workload to flow to external resources. Enable automated workload roll-over and resubmission in the event of internal network or cluster failures. Provide admin notification prior to rollover to allow manual override of rollover. Allow manual reconfiguration of external resource access rights to allow production use of external resources in the event of extended internal failures or excessive workload.
Enable service level agreements within local, partner, and commercial resources to enable next-to-run, and automated preemption based on workload priority. Allow workload to be re-directed automatically as local workload levels drop or local systems are brought back online. This solution will allow users to see and utilize all potential compute resources as if they were local, even using local portals and graphical interfaces, even in the event of major local and remote failures.
|