|
|||
13.9 Managing Networks13.9.1 Network Management OverviewNetwork resources can be tightly integrated with the rest of a compute cluster using the Moab multi-resource manager management interface. This interface has the following capabilities:
13.9.2 Dynamic VLAN CreationMost sites using dynamic VLAN's operate under the following assumptions:
In this environment, organizations may choose to have VLANs automatically configured that encapsulate individual jobs or VPC requests. These VLAN's essentially disconnect the job from either incoming or outgoing communication with other compute nodes. 13.9.2.1 Configuring VLANsAutomated VLAN management can be enabled by setting up a network resource manager that supports dynamic VLAN configuration and a QoS to request this feature. The example configuration highlights this setup: 13.9.2.2 Requesting a VLANVLANs can be requested on a per job basis directly using the associated resource manager extension or indirectly by requesting a QoS with a VLAN security requirement. 13.9.3 Network Health MonitoringNetwork-level health monitoring is enabled by supporting the cluster query action in the network resource manager and specifying the appropriate CLUSTERQUERYURL attribute in the associated resource manager interface. Node (virtual node) query commands (mnodectl,checknode) can be used to view this health information that will also be correlated with associated workload and written to persistent accounting records. Network health based event information can also be fed into generic events and used to drive appropriate event based triggers. At present, health attributes such as fan speed, temperature, port failures, and various core switch failures can be monitored and reported. Additional failure events are monitored and reported as support is added within the network management system. 13.9.4 Network Load MonitoringNetwork-level load monitoring is enabled by supporting the cluster query action in the network resource manager and specifying the appropriate CLUSTERQUERYURL attribute in the associated resource manager interface. Node (virtual node) query commands (mnodectl,checknode) can be used to view this load information that will also be correlated with associated workload and written to persistent accounting records. Load information can also be fed into generic metrics and used to drive appropriate load based triggers. 13.9.5 Providing Per-QoS and Per-Job Bandwidth and Latency GuaranteesIntra-job bandwidth and latency guarantees can be requested on a per job and/or per QoS basis using the BANDWIDTH and LATENCY resource manager extensions (for jobs) and the MINBANDWIDTH and MAXLATENCY QoS attributes (for QoS limits). If specified, Moab does not allow a job to start unless these criteria can be satisfied via proper resource allocation or dynamic network partitions. As needed, Moab makes future resource reservations to be able to guarantee required allocations. Example NOTE: If dynamic network partitions are enabled, a NODEMODIFYURL attribute must be properly configured to drive the network resource manager. (See Native Resource Manager Overview for details.) 13.9.6 Enabling Workload-Aware Network MaintenanceNetwork-aware maintenance is enabled by supporting the modify action in the network resource manager and specifying the appropriate NODEMODIFYURL attribute in the associated resource manager interface. Administrator resource management commands, (mnodectl and mrmctl), will then be routed directly through the resource manager to the network management system. In addition, reservation and real-time generic event and generic metric triggers can be configured to intelligently drive these facilities for maintenance and auto-recovery purposes. Maintenance actions can include powering on and off the switch as well as rebooting/recycling all or part of the network. Additional operations are enabled as supported by the underlying networks. 13.9.7 Enabling Network-Aware Scheduling DecisionsMoab has the ability to support network-aware resource allocation algorithms either via its resource allocation plug-in interface or by way of direct interaction with an intelligent network management system. 13.9.7.1 Plug-in Based Network Aware Allocation AlgorithmsIf a plug-in interface is used, the algorithm will be responsible for allocating resources in such a way as to do the following:
As input, each call to the allocation algorithm will include the following:
The algorithm returns SUCCESS if a satisfactory allocation is made; otherwise, FAILURE is returned. Upon successful completion, the algorithm returns a list of nodes and associated taskcounts that can be allocated to the specified job req (taskgroup). Upon failure, the algorithm returns a failure status code and a human readable message indicating the reason for the failure. NOTE: This algorithm is called once per job as jobs are started as well as once per job as future job reservations are made. Depending on workload and policies, this may result in this algorithm being called hundreds or thousands of times per scheduling iteration. Depending on cluster size, appropriate scaling considerations should be taken into account to allow appropriate responsiveness. NOTE: The job level QoS credential indicates if the job is authorized to create dynamic network partitions with bandwidth and/or latency guarantees. If authorized by the QoS and supported by the network management system, this algorithm can contact the network resource manager directly and make appropriate calls. These calls should only be made for immediate allocations and not for future reservations as specified via the start time parameter. 13.9.7.2 Use of Intelligent Network Scheduling APIsIf the network management system (NMS) supports an allocation query API, Moab can be configured to use this to enhance its existing allocation policies. Depending on the underlying capabilities of the NMS, the following queries can be used:
In each case, Moab will pass to the NMS service a list of nodes that can be considered for allocation together with the number of tasks required. NOTE: Both networks and certain exotic architectures can impose various allocation constraints. In such cases, the feasible allocation query should return an allocation consistent with both the network and the underlying hardware architecture.13.9.8 Creating a Resource Management Interface for a New NetworkMany popular networks are supported using interfaces provided in the Moab tools directory. If a required network interface is not available, a new one can be created using the following guidelines: General Requirements In all cases, a network resource manager should respond to a cluster query request by reporting a single node with a node name that will not conflict with any existing compute nodes. This node should report as a minimum the state attribute. Monitoring Load Network load is reported to Moab using the generic resource bandwidth. For greatest value, both configured and used bandwidth (in megabytes per second) should be reported as in the following example: Monitoring Failures Network warning and failure events can be reported to Moab using the gevent metric. If automated responses are enabled, embedded epochtime information should be included. Controlling Router State Router power state can be controlled as a system modify interface is created that supports the commands on, off, and reset. Creating VLANs VLAN creation, management, and reporting is more advanced requiring persistent VLAN ID tracking, global pool creation, and other features. Use of existing routing interface tools as templates is highly advised. VLAN management requires use of both the cluster query interface and the system modify interface. 13.9.9 Per-Job Network MonitoringIt is possible to gather network usage on a per job basis using the Native Interface. When the native interface has been configured to report netin and netout Moab automatically gathers this data through the life of a job and reports total usage statistics upon job completion. This information is visible to users and administrators via command-line utilities, the web portal, and the desktop graphical interfaces. See Also
|
|||
| © 2001-2008 Cluster Resources, Incorporated | |||