8.4 Preemption Policies
Many sites possess workloads of varying importance. While it may be critical that some jobs obtain resources immediately, other jobs are less turnaround time sensitive but have an insatiable hunger for compute cycles, consuming every available cycle for years on end. These latter jobs often have turnaround times on the order of weeks or months. The concept of cycle stealing, popularized by systems such as Condor, handles such situations well and enables systems to run low priority, preemptible jobs whenever something more pressing is not running. These other systems are often employed on compute farms of desktops where the jobs must vacate anytime interactive system use is detected.
8.4.1 Preemption TriggersPreemption can be enabled in one of three ways. These include manual intervention, QOS based configuration, and use of the preemption based backfill algorithm.
220.127.116.11 Admin Preemption CommandsThe mjobctl command can be used to preempt jobs. Specifically, the command can be used to modify a job's execution state in the following ways:
In general, users are allowed to suspend or terminate
jobs they own. Administrators are allowed to suspend, terminate, resume,
and execute any queued jobs.
18.104.22.168 QOS Based PreemptionMaui's QoS-based preemption system allows a site the ability to specify preemption rules and control access to preemption privileges. These abilities can be used to increase system throughput, improve job response time for specific classes of jobs, or other enable various political policies. All policies are enabled by specifying some QOS's with the flag PREEMPTOR , and other with the flag PREEMPTEE. For example, to enable a cycle stealing high throughput cluster, a QOS can be created for high priority jobs and marked with the flag PREEMPTOR; another QOS can be created for low priority jobs and marked with the flag PREEMPTEE . Finally, the RESERVATIONPOLICY parameter can be set to NEVER. With this configuration, low priority, preemptee jobs can be started whenever idle resources are available. These jobs will be allowed to run until a high priority job arrives, at which point the necessary low priority jobs will be preempted and the needed resources freed. This allows near immediate resource access for the high priority jobs. Using this approach, a cluster can maintain near 100% system utilization while still delivering excellent turnaround time to the jobs of greatest value.
It is important to note the rules of QoS based preemption. Preemption only occurs when the following 3 conditions are satisfied:
Use of the preemption system need not be limited to controlling low priority jobs. Other uses include optimistic scheduling and development job support.
22.214.171.124 Preemption Based BackfillThe PREEMPT backfill policy allows a site to take advantage of optimistic scheduling. By default, backfill only allows jobs to run if they are guaranteed to have adequate time to run to completion. However, statistically, most jobs do not utilize their full requested wallclock limit. The PREEMPT backfill policy allows the scheduler to start backfill jobs even if required walltime is not available. If the job runs too long and interferes with another job which was guaranteed a particular timeslot, the backfill job is preempted and the priority job is allowed to run. When another potential timeslot becomes available, the preempted backfill job will again be optimistically executed. In environments with checkpointing or with poor wallclock accuracies, this algorithm has potential for significant savings. See the backfill section for more information.
8.4.2 Types of PreemptionHow the scheduler preempts a job is controlled by the PREEMPTPOLICY parameter. This parameter allows preemption to be enforced in one of the following manners:
126.96.36.199 Job RequeueUnder this policy, active jobs are terminated and returned to the job queue in an idle state.
188.8.131.52 Job SuspendSuspend causes active jobs to stop executing but to remain in memory or the allocated compute nodes. While a suspended job frees up processor resources, it may continue to consume swap and/or other resources. Suspended jobs must be 'resumed' to continue executing. NOTE:If 'suspend' based preemption is selected, then the signal used to initiate the job suspend may be specified by setting the RM specific 'SUSPENDSIG' attribute, i.e. 'RMCFG[base] SUSPENDSIG=23'.
184.108.40.206 Job CheckpointSystems which support job checkpointing allow a job to save off its current state and either terminate or continue running. A checkpointed job may be restarted at any time and resume execution from its most recent checkpoint.
220.127.116.11 RM Preemption ConstraintsMaui is only able to utilize preemption if the underlying resource manager/OS combination supports this capability. The following table displays current preemption limitations:
Table 18.104.22.168 Resource Manager Preemption Constraints
|© 2001-2010 Adaptive Computing Enterprises, Inc.|