|
|||
5.6 Virtualization and Resource Provisioning
5.6.1 Resource Provisioning OverviewWhen processing a resource request, Moab attempts to match the request to an existing available resource. However, if the scheduler determines that the resource is not available or will not be available due to load or policy for an appreciable amount of time, it can select a resource to modify to meet the needs of the current requests. This process of modifying resources to meet existing needs is called provisioning. Currently, there are two types of provisioning supported: (1) operating system (OS) and (2) application. As its name suggests, OS provisioning allows the scheduler to modify the operating system of an existing compute node while application level provisioning allows the scheduler to request that a software application be made available on a given compute node. In each case, Moab evaluates the costs of making the change in terms of time and other resources consumed before making the decision. Only if the benefits outweigh the costs will the scheduler initiate the change required to support the current workload. 5.6.2 Configuring ProvisioningEnabling provisioning consists of configuring an interface to a provisioning manager, specifying which nodes can take advantage of this service, and what the estimated cost and duration of each change will be. This interface can be used to contact a system such as System Imager, XCat, Xen, RedCarpet, NIM, or to contact a locally developed system via a script or web service. 5.6.2.1 Enabling Moab Provisioning with SystemImagerThe SystemImager tool is a widely used open source tool that allows flexible automated installation or provisioning of compute hosts within a cluster. Interfacing Moab with SystemImager can be done in one of two ways: (1) using triggers and (2) using a native resource manager interface. Trigger Based Provisioning Interface When a job or reservation becomes active, Moab can custom tailor its environment including changing the operating system of the allocated nodes through the use of triggers. In the case of a job trigger, You can use something like the following:
In the preceding example, any reservation that uses the rhel3 profile will reinstall all nodes to use the rhel3 operating system for the duration of the reservation. The second and third trigger makes certain that when the reservation ends or if it is canceled, the nodes are restored to their default operating system. Resource Manager Based Provisioning Interface With a resource manager based provisioning interface, Moab uses the provisioning manager to create the nodes needed by various jobs. In this model, the provisioning manager can be set up, users can submit jobs requiring any available operating system, and Moab can dynamically reprovision compute nodes to meet current workload needs.
With this configuration, Moab can automatically load-balance resources to meet the needs of submitted jobs. Correct operation of the interface's querying capabilities can be verified by issuing mdiag -R and mdiag -n to look at resource manager and node configuration respectively. To verify that Moab can correctly drive SystemImager to install a node, use the mnodectl -m command as demonstrated in the following example: 5.6.2.2 Setting Up Xen From ScratchXen is a virtual machine monitor for x86 that supports executing multiple guest operating systems with high levels of performance and resource isolation. The following explains how to set up and run a Linux system with one or more Xen domains. Prerequisites include an installed and networked Fedora Core 4 system and the following required packages:
Installing the Required PackagesThe Xen package and dependencies can be installed by using the following YUM command:
Additionally, the Xen domain0 and user domain kernels need to be installed.
This will also create an entry in the /boot/grub/menu.lst file so that the domain0 kernel can be booted. Domain0 is where the Xen daemon is started and further domains can be created. Creating a Xen Domain File SystemDomain is the term Xen developers use to describe a virtual machine. Any number of domains can be created on a host machine as long as sufficient resources are available. Although it is possible to create a domain file system of any Linux flavor (or NetBSD or FreeBSD), the following information outlines how to create a Debian Linux file system. The first step is to create the file system and swap images. Depending on the amount of software you want installed in the domain, these images will vary in size. For a minimal install, the file system image can be as small as 200 MB. The swap image should be the same size as the amount of RAM you want the domain to allocate.
Then, initialize.
Note: The next steps must be done on an Ubuntu or Debian system. The required debootstrap program to create a minimal Debian install is only available on Ubuntu and Debian systems. Optionally, you can use the program alien to convert the Debian package to an RPM and install it on an rpm-based system.
Mount the newly created file system image.
Create the base Debian installation.
At this point your system is an unconfigured base system. Edit the following files to establish a valid configuration.
The following entries should be made to /etc/fstab:
The following should be listed in /etc/sources.list:
Edit the /etc/network/interfaces file to set up networking. The following setup uses DHCP to obtain network settings:
Creating a Domain Configuration FileThere are two ways to set domain parameters before starting a domain. One way is to specify all the domain parameters in a configuration file. Alternatively, you can set the necessary parameters at run time. The following is an example configuration file for the file system just created:
The new domain can now be started. First start Xen if it's not running, then create the domain:
You should now see the new domain booting up. Log in as root (no password) and go ahead and update apt and install any additional packages needed.
Shared /usr DirectoryTo further cut-down on file system image size, it is possible to share a common NFS-mounted /usr directory between multiple domains. To do this, create an NFS export in domain0. All software that is installed will then be common to each domain that is run. Custom IP AddressXen does not allow setting the domain IP via the configuration file. To work around this, you can pass in kernel parameters with the extra="xxx=yyy aaa=bbb" domain configuration parameter. Then, custom startup scripts can be created and called from /etc/init.d/rcS to catch these parameters and set up the /etc/network/interfaces file as required. 5.6.3 Virtualization OverviewThe types of workload in a cluster or grid environment are often complex. Jobs may not only perform intricate computations, but the required environment for each job may also be complex. In some cases different jobs may require different operating systems, libraries, applications, as well as different memory, disk, and processor resources. If there are even a handful of jobs with such variance in requirements, then an organization will quickly become overwhelmed with the monetary and temporal costs of maintaining compute resources to satisfy all possible requests. A solution to this dilemma is to use modern virtualization technologies. Virtualization allows a compute node to run one or more virtual machines simultaneously, each with their own distinct operating systems, software, and hardware configurations. These virtual machines (also commonly called virtual domains) can be created and booted on-the-fly from pre-created system images. This is usually much more efficient than actually re-provisioning a node with new software or rebooting the node into a new operating system. Virtualized machines have recently made performance increases that make them only slightly slower than their physical counterparts. Entire clusters using virtualization, or a subset of compute nodes, can therefore be switched to use a different system configuration/image in seconds, as opposed to minutes or hours. Moab can use this technology to automatically satisfy the complex environment requirements attached to different jobs. Before a job with special requirements starts, Moab can launch and configure virtual domains on the job's allocated resources. The job then runs without incident, oblivious that it is executing on virtualized nodes. When the job finishes the nodes are free to be used again by similar jobs or re-virtualized to another configuration. 5.6.4 Virtualization PrerequisitesAlthough several virtualization software packages are offered, Moab Workload Manager currently supports the Xen virtual machine monitor. Additionally, Moab currently supports virtualization of nodes running under TORQUE 2.1.0 or higher. There are several prerequisites to Moab driving virtualization of compute resources in a cluster:
5.6.5 Configuring VirtualizationAfter the prerequisites have been met, you can now configure Moab to support virtualization. Moab comes with support scripts in its tools directory that are used to control the virtualization process. By associating these scripts with resource managers in the moab.cfg file, Moab can virtualize nodes owned by those resource managers and even migrate virtual nodes to another cluster using virtual migration. First, examine the tools/config.xen.pl file. This file controls the configuration of the virtualization scripts; modify this file to fit your particular needs. If you need assistance with this task, please contact your support representative at Cluster Resources. Next, we add an attribute to the moab.cfg file to activate the virtualization scripts. The RMCFG attribute used to configure the activation is NODEVIRTUALIZEURL. Add this attribute to every RMCFG defining resources managers that control virtualization capable nodes. The tools/node.virtualize.xen.pl script should be the value of this attribute. Example In addition to the NODEVIRTUALIZEURL, an additional resource manager should be defined that imports information about the available Xen images on each of the nodes. A simple script, tools/node.mon.xen.pl is bundled with Moab as an example of how one might discover available images using a native resource manager interface. (The bundled script does not scale well as the number of nodes in a cluster grows--it is only provided as an example. We recommend customizing this script to your site's needs in order to better increase its performance.) Using the node.mon.xen.pl script is straightforward. Simply change your moab.cfg as follows: Example When a virtual machine is created on an existing node, the virtual machine is given a unique hostname. The hostname is simply a variant of the physical node's name. How the virtual node is named is configured with two parameters: VIRTUALNODEPREFIX and VIRTUALNODESUFFIX. The values of these parameters is prepended or appended, respectively, to the physical node name in order to create the new hostname for the virtual node. For example, if you want all virtual nodes to have the prefix "v_," add a VIRTUALNODEPREFIX line to the moab.cfg file:
NOTE: The virtual node hostnames MUST be resolvable to valid IP addresses. If you virtualize the physical compute node "node09," the new virtual machine (virtual child) running on "node09" would have the hostname "v_node09" and would have the IP address that resolves from "v_node09." After making these changes to the moab.cfg file, restart/recycle Moab for the changes to take effect. 5.6.6 Virtualizing a NodeAfter meeting prerequisites and configuring Moab, you are ready to virtualize a node. The mnodectl command is used to create a new virtual node (virtual child) on one of the existing physical nodes. In addition to specifying which node the virtual machine should be started on, users also need to specify which image should be used to start the virtual machine. A list of the available images on a node can be determined by using the checknode -v command. Once the proper image has been located, you can virtualize a node using the command syntax mnodectl -V image=<IMAGE_NAME> <NODE_EXP>. For example, if you want to create a new virtual child on the node "node09" using a Debian Sarge image, issue the following command:
You can see the progress of the node virtualization by running checknode -v on the physical node:
After the new virtual node has been created, its characteristics (such as memory and processor count) are reported through the physical node. So in our example, "node09" will take on the characteristics of its virtual child "v_node09." Moab recognizes that a reference to either "node09" or "j_node09" means the virtual node running on "node09." This allows users to submit jobs to either "node09" or "v_node09" and have confidence the jobs will successfully run on the new virtual node. [an error occurred while processing this directive]See Also
|
|||
| © 2001-2008 Cluster Resources, Incorporated | |||