Fairshare is a mechanism that allows historical resource utilization information to be incorporated into job feasibility and priority decisions. Moab's fairshare implementation allows organizations to set system utilization targets for users, groups, accounts, classes, and QoS levels. You can use both local and global (multi-cluster) fairshare information to make local scheduling decisions.
Fairshare allows historical resource utilization information to be incorporated into job feasibility and priority decisions. This feature allows site administrators to set system utilization targets for users, groups, accounts, classes, and QoS levels. Administrators can also specify the time frame over which resource utilization is evaluated in determining whether the goal is being reached. Parameters allow sites to specify the utilization metric, how historical information is aggregated, and the effect of fairshare state on scheduling behavior. You can specify fairshare targets for any credentials (such as user, group, and class) that administrators want such information to affect.
Fairshare is configured at two levels. First, at a system level, configuration is required to determine how fairshare usage information is to be collected and processed. Second, some configuration is required at the credential level to determine how this fairshare information affects particular jobs. The following are system level parameters:
If global (multi-cluster) fairshare is used, Moab must be configured to synchronize this information with an identity manager.
As Moab runs, it records how available resources are used. Each iteration (RMPOLLINTERVAL seconds) it updates fairshare resource utilization statistics. Resource utilization is tracked in accordance with the FSPOLICY parameter allowing various aspects of resource consumption information to be measured. This parameter allows selection of both the types of resources to be tracked as well as the method of tracking. It provides the option of tracking usage by dedicated or consumed resources, where dedicated usage tracks what the scheduler assigns to the job and consumed usage tracks what the job actually uses.
An example may clarify the use of the FSPOLICY parameter. Assume a 4-processor job is running a parallel /bin/sleep for 15 minutes. It will have a dedicated fairshare usage of 1 processor-hour but a consumed fairshare usage of essentially nothing since it did not consume anything. Most often, dedicated fairshare usage is used on dedicated resource platforms while consumed tracking is used in shared SMP environments.
FSPOLICY DEDICATEDPS% FSINTERVAL 24:00:00 FSDEPTH 28 FSDECAY 0.75
By default, when comparing fairshare usage against fairshare targets, Moab calculates usage as a percentage of delivered cycles. To change the usage calculation to be based on available cycles, rather than delivered cycles, the percent (%) character can be specified at the end of the FSPOLICY value as in the preceding example.
When configuring fairshare, it is important to determine the proper timeframe that should be considered. Many sites choose to incorporate historical usage information from the last one to two weeks while others are only concerned about the events of the last few hours. The correct setting is very site dependent and usually incorporates both average job turnaround time and site mission policies.
With Moab's fairshare system, time is broken into a number of distinct fairshare windows. Sites configure the amount of time they want to consider by specifying two parameters, FSINTERVAL and FSDEPTH. The FSINTERVAL parameter specifies the duration of each window while the FSDEPTH parameter indicates the number of windows to consider. Thus, the total time evaluated by fairshare is simply FSINTERVAL * FSDEPTH.
Many sites want to limit the impact of fairshare data according to its age. The FSDECAY parameter allows this, causing the most recent fairshare data to contribute more to a credential's total fairshare usage than older data. This parameter is specified as a standard decay factor, which is applied to the fairshare data. Generally, decay factors are specified as a value between 1 and 0 where a value of 1 (the default) indicates no decay should be specified. The smaller the number, the more rapid the decay using the calculation WeightedValue = Value * <DECAY> ^ <N> where <N> is the window number. The following table shows the impact of a number of commonly used decay factors on the percentage contribution of each fairshare window.
While selecting how the total fairshare time frame is broken up between the number and length of windows is a matter of preference, it is important to note that more windows will cause the decay factor to degrade the contribution of aged data more quickly.
Using the selected fairshare usage metric, Moab continues to update the current fairshare window until it reaches a fairshare window boundary, at which point it rolls the fairshare window and begins updating the new window. The information for each window is stored in its own file located in the Moab statistics directory. Each file is named FS.<EPOCHTIME>[.<PNAME>] where <EPOCHTIME> is the time the new fairshare window became active (see sample data file) and <PNAME> is only used if per-partition share trees are configured. Each window contains utilization information for each entity as well as for total usage.
When Moab needs to determine current fairshare usage for a particular credential, it calculates a decay-weighted average of the usage information for that credential using the most recent fairshare intervals where the number of windows evaluated is controlled by the FSDEPTH parameter. For example, assume the credential of interest is user john and the following parameters are set:
FSINTERVAL 12:00:00 FSDEPTH 4 FSDECAY 0.5
Further assume that the fairshare usage intervals have the following usage amounts:
Based on this information, the current fairshare usage for user john would calculated as follows:
Usage = (60 + .5^1 * 0 + .5^2 * 10 + .5^3 * 50) / (110 + .5^1*125 + .5^2*100 + .5^3*150)
Once the global fairshare policies have been configured, the next step involves applying resulting fairshare usage information to affect scheduling behavior. As mentioned in the Fairshare Overview, site administrators can configure how fairshare information impacts scheduling behavior. This is done through specification of fairshare targets. The targets can be applied to user, group, account, QoS, or class credentials using the FSTARGET attribute of *CFG credential parameters. These targets allow fairshare information to affect job priority and each target can be independently selected to be one of the types documented in the following table:
The following example increases the priority of jobs belonging to user john until he reaches 16.5% of total cluster usage. All other users have priority adjusted both up and down to bring them to their target usage of 10%:
FSPOLICY DEDICATEDPS FSWEIGHT 1 FSUSERWEIGHT 100 USERCFG[john] FSTARGET=16.5+ USERCFG[DEFAULT] FSTARGET=10 ...
Where fairshare targets affect a job's priority and position in the eligible queue, fairshare caps affect a job's eligibility. Caps can be applied to users, accounts, groups, classes, and QoS's using the FSCAP attribute of *CFG credential parameters and can be configured to modify scheduling behavior. Unlike fairshare targets, if a credential reaches its fairshare cap, its jobs can no longer run and are thus removed from the eligible queue and placed in the blocked queue. In this respect, fairshare targets behave like soft limits and fairshare caps behave like hard limits. Fairshare caps can be absolute or relative as described in the following table. If no modifier is specified, the cap is interpreted as relative.
The following example constrains the marketing account to use no more than 16,500 processor seconds during any given floating one week window. At the same time, all other accounts are constrained to use no more than 10% of the total delivered processor seconds during any given one week window.
FSPOLICY DEDICATEDPS FSINTERVAL 12:00:00 FSDEPTH 14 ACCOUNTCFG[marketing] FSCAP=16500^ ACCOUNTCFG[DEFAULT] FSCAP=10 ...
The most commonly used type of fairshare is priority based fairshare. In this mode, fairshare information does not affect whether a job can run, but rather only the job's priority relative to other jobs. In most cases, this is the desired behavior. Using the standard fairshare target, the priority of jobs of a particular user who has used too many resources over the specified fairshare window is lowered. Also, the standard fairshare target increases the priority of jobs that have not received enough resources.
While the standard fairshare target is the most commonly used, Moab can also specify fairshare ceilings and floors. These targets are like the default target; however, ceilings only adjust priority down when usage is too high and floors only adjust priority up when usage is too low.
Since fairshare usage information must be integrated with with Moab's overall priority mechanism, it is critical that the corresponding fairshare priority weights be set. Specifically, the FSWEIGHT component weight parameter and the target type subcomponent weight (such as FSUSERWEIGHT and FSGROUPWEIGHT) be specified.
# set relative component weighting FSUSERWEIGHT 10 FSGROUPWEIGHT 50 FSINTERVAL 12:00:00 FSDEPTH 4 FSDECAY 0.5 FSPOLICY DEDICATEDPS # all users should have a FS target of 10% USERCFG[DEFAULT] FSTARGET=10.0 # user john gets extra cycles USERCFG[john] FSTARGET=20.0 # reduce staff priority if group usage exceed 15% GROUPCFG[staff] FSTARGET=15.0- # give group orion additional priority if usage drops below 25.7% GROUPCFG[orion] FSTARGET=25.7+
Credential-specific fairshare weights can be set using the FSWEIGHT attribute of the ACCOUNT, GROUP, and QOS credentials as in the following example:
FSWEIGHT 1000 ACCOUNTCFG[orion1] FSWEIGHT=100 ACCOUNTCFG[orion2] FSWEIGHT=200 ACCOUNTCFG[orion3] FSWEIGHT=-100 GROUPCFG[staff] FSWEIGHT=10
If specified, a per-credential fairshare weight is added to the global component fairshare weight.
Example 1 represents a university setting where different schools have access to a cluster. The Engineering department has put the most money into the cluster and therefore has greater access to the cluster. The Math, Computer Science, and Physics departments have also pooled their money into the cluster and have reduced relative access. A support group also has access to the cluster, but since they only require minimal compute time and shouldn't block the higher-paying departments, they are constrained to five percent of the cluster. At this time, users Tom and John have specific high-priority projects that need increased cycles.
#global general usage limits - negative priority jobs are considered in scheduling ENABLENEGJOBPRIORITY TRUE # site policy - no job can last longer than 8 hours USERCFG[DEFAULT] MAX.WCLIMIT=8:00:00 # Note: default user FS target only specified to apply default user-to-user balance USERCFG[DEFAULT] FSTARGET=1 # high-level fairshare config FSPOLICY DEDICATEDPS FSINTERVAL 12:00:00 FSDEPTH 32 #recycle FS every 16 days FSDECAY 0.8 #favor more recent usage info # qos config QOSCFG[inst] FSTARGET=25 QOSCFG[supp] FSTARGET=5 QOSCFG[premium] FSTARGET=70 # account config (QoS access and fstargets) # Note: user-to-account mapping handled via allocation manager # Note: FS targets are percentage of total cluster, not percentage of QOS ACCOUNTCFG[cs] QLIST=inst FSTARGET=10 ACCOUNTCFG[math] QLIST=inst FSTARGET=15 ACCOUNTCFG[phys] QLIST=supp FSTARGET=5 ACCOUNTCFG[eng] QLIST=premium FSTARGET=70 # handle per-user priority exceptions USERCFG[tom] PRIORITY=100 USERCFG[john] PRIORITY=35 # define overall job priority USERWEIGHT 10 # user exceptions # relative FS weights (Note: QOS overrides ACCOUNT which overrides USER) FSUSERWEIGHT 1 FSACCOUNTWEIGHT 10 FSQOSWEIGHT 100 # apply XFactor to balance cycle delivery by job size fairly # Note: queuetime factor also on by default (use QUEUETIMEWEIGHT to adjust) XFACTORWEIGHT 100 # enable preemption PREEMPTPOLICY REQUEUE # temporarily allow phys to preempt math ACCOUNTCFG[phys] JOBFLAGS=PREEMPTOR PRIORITY=1000 ACCOUNTCFG[math] JOBFLAGS=PREEMPTEE
Moab supports arbitrary depth hierarchical fairshare based on a share tree. In this model, users, groups, classes, and accounts can be arbitrarily organized and their usage tracked and limited. Moab extends common share tree concepts to allow mixing of credential types, enforcement of ceiling and floor style usage targets, and mixing of hierarchical fairshare state with other priority components.
The FSTREE parameter can be used to define and configure the share tree used in fairshare configuration. This parameter supports the following attributes:
Current tree configuration and monitored usage distribution is available using the mdiag -f -v commands.
Moab provides multiple policies to customize how the share tree is evaluated.
22.214.171.124.1 Using FS Floors and Ceilings with Hierarchical Fairshare
All standard fairshare facilities including target floors, target ceilings, and target caps are supported when using hierarchical fairshare.
126.96.36.199.2 Multi-Partition Fairshare
Moab supports independent, per-partition hierarchical fairshare targets allowing each partition to possess independent prioritization and usage constraint settings. This is accomplished by setting the SHARES attribute of the FSTREE parameter and using the per-partition share specification.
In the following example, partition 1 is shared by the engineering and research departments, all organizations are allowed to use various portions of partition 2, and partition 3 is only accessible by research and sales.
FSTREE[root] SHARES=10000 MEMBERLIST=eng,research,sales FSTREE[eng] SHARES=500@par1,100@par2 MEMBERLIST=user:johnt,user:stevek FSTREE[research] SHARES=1000@par1,500@par2,2000@par3 MEMBERLIST=user:barry,user:jsmith,user:bf4 FSTREE[sales] SHARES=500@par2,1000@par3 MEMBERLIST=user:jen,user:lisa
188.8.131.52.3 Dynamically Importing Share Tree Data
Share trees can be centrally defined within a database, flat file, information service, or other system and this information can be dynamically imported and used within Moab by setting the fstree parameter within the Identity Manager Interface. This interface can be used to load current information at startup and periodically synchronize this information with the master source.
Share trees defined within a flat file can be cumbersome; consider running tidy for xml to improve readability. Sample usage:
> tidy -i -xml goldy.cfg <filename> <output file>
Sample (truncated) output:
FSTREE[tree] <fstree> <tnode partition="g02" name="root" type="acct" share="100"> ... </tnode> </fstree>
184.108.40.206.4 Specifying Share Tree Based Limits
Limits can be specified on internal nodes of the share tree using standard credential limit semantics as shown in the following example:
FSTREE[sales] SHARES=400 MAXJOB=15 MAXPROC=200 MEMBERLIST=s1,s2,s3 FSTREE[s1] SHARES=150 MAXJOB=4 MAXPROC=40 MEMBERLIST=user:ben,user:jum3 FSTREE[s2] SHARES=50 MAXJOB=1 MAXPROC=50 MEMBERLIST=user:carol,user:johnson FSTREE[s3] SHARES=200 MAXPS=4000 MAXPROC=150 MEMBERLIST=s3a,s3b,s3c
220.127.116.11.5 Other Uses of Share Trees
If a share tree is defined, it can be used for purposes beyond fairshare. These include organizing general usage and performance statistics for reporting purposes (see showstats -T), enforcement of tree node based usage limits, and specification of resource access policies.
Moab can import fairshare data from external sources. Global fairshare data can be imported using the Identity Manager interface. To import global fairshare data, the total global fairshare usage must be imported on the "sched" object through the identity manager in addition to the global fairshare usage and target for particular credentials.
The following example shows a sample moab.cfg file that incorporates fairshare data from an external source and factors it into job priority:
IDCFG[gfs] SERVER="file:///$HOME/tools/id.txt" REFRESHPERIOD=minute FSPOLICY DEDICATEDPS FSWEIGHT 1 FSGUSERWEIGHT 1 FSGGROUPWEIGHT 1 FSGACCOUNTWEIGHT 1
sched globalfsusage=890 user:wightman globalfsusage=8 globalfstarget=100 group:wightman globalfsusage=8 globalfstarget=10 acct:project globalfsusage=24 globalfstarget=50
$ mdiag -p -v diagnosing job priority information (partition: ALL) Job PRIORITY* FS(GUser: GGrp:GAcct) Serv(QTime) Weights -------- 1( 1: 1: 1) 1( 1) 16 157 99.4( 99.9: 9.1: 47.3) 0.6( 1.0) Percent Contribution -------- 99.4( 63.5: 5.8: 30.1) 0.6( 0.6)
In this example, Moab imports fairshare information from an external source and uses it to calculate a job's priority.
Searches Moab documentation only
|© 2001-2010 Adaptive Computing Enterprises, Inc.|