[Mauiusers] fairshare and backfill

Bas van der Vlies basv at sara.nl
Thu Feb 7 01:14:47 MST 2008


On Feb 6, 2008, at 9:55 PM, Steve Young wrote:

> Hi,
>         Ok I guess this is my 3rd and last attempt to ask how to do  
> this.
> All I would really like is to know how to make a person's fairshare =
> 1 / <number of users who have submitted jobs>
> With a fairshare like this it would seem like then everyone would get
> an equal amount of resources.
>
> Aside from running as a FIFO scheduler this would seem to be the next
> step for most. Allowing anyone to run any number of jobs but assuring
> that each user gets an equal fairshare of resources. This example
> would fit nicely in the Appendix "Case Studies" in the maui admin
> manual. I'd be happy to help put a case study together to show this
> if someone could help explain how to do it =). Thanks,
>
> -Steve
>
>
Steve,

   I do not know if this answer your question.  When for eg a user  
has a fairshare of 25% and he has used all of it. He gets a negative  
priority. You can set in the maui.cfg to:
  http://www.clusterresources.com/products/maui/docs/ 
a.fparameters.shtml#rejectnegpriojobs


what we have done is every user has a MAXPS for eg 600 hours, now he  
can run:
   * 100 jobs for 6 hours
   * or 10 jobs for 60 hours
  * ...

This is another solution that a user does not monopolize the system.   
It also very dynamic when a user submits 11 jobs with a walltime of  
60 hours. The system will schedule 10 jobs and after 6 hours the 11th  
job can run, because 10 * 6 = 60 hours are  reallocated to the user's  
MAXPS

At our site we use MAXPS and FAIRSHARE to balance the system and  
preventing users to monopolize the system.

Regards

>
>
>
> On Jan 31, 2008, at 12:18 PM, Steve Young wrote:
>
>> Hi,
>>       So perhaps my first question didn't make any sense ;-).  
>> Basically,
>> I am trying to figure out how to prevent a user from tying up all
>> the resources by submitting a large amount of jobs to the queue. If
>> no one else has requested any resources then as many jobs that can
>> run should. However, if someone else submits jobs I'd like for them
>> to get their fair-share of resources without having to wait for the
>> first users jobs to complete and free up resources.
>>       Perhaps, I've answered my question with "fairshare" and  
>> should be
>> looking more closely at that. What I don't understand is if I were
>> to set a fairshare of lets say 25% per user, how would this effect
>> the queue system if no one else was running jobs. Essentially, I'd
>> want this user to get 100% of the resources until someone else were
>> to submit. Here is the policy I'd like to simulate:
>>
>> A user is guaranteed  up to 32cpu's or 32Gb of ram. Above that,
>> jobs will get backfilled. Backfilled jobs will get suspended as
>> more users utilize the queue.
>>
>> Perhaps, I have this all wrong and should be thinking about it
>> differently. I'd love to see some examples of what others have done
>> to remedy this situation. Thanks in advance,
>>
>> -Steve
>>
>>
>>
>> On Jan 28, 2008, at 2:55 PM, Steve Young wrote:
>>
>>> Hi,
>>>      I am trying to figure out how to make the following work within
>>> our cluster using torque/maui:
>>>
>>>
>>> I'd like a user to be able to submit as many jobs as they like.
>>> However, they should only be allowed up to 32cpu or 32gb of
>>> memory. After that if there are idle resources then the rest of
>>> their jobs can be backfilled on idle nodes.
>>>
>>> If another user submits jobs they should get the same policy and
>>> pre-empt any backfilled jobs (if that's required to meet the 32cpu
>>> or memory limit).
>>>
>>> So basically, I think this should be fairly common. I want to run
>>> as many jobs as possible on idle resources but only guarantee the
>>> jobs that fall under the MAXPROC/MAXMEM policy. I've implemented
>>> the MAXPROC/MAXMEM policy but it appears backfill won't work for
>>> the remaining jobs. So I am assuming backfill has to abide by the
>>> MAXPROC/MAXMEM policy I have in place. Can anyone give me some
>>> pointers to the proper way to implement this? Thanks in advance!
>>>
>>> -Steve
>>>
>>>
>>> [root@ maui]# cat maui.cfg (edited for some content)
>>> # maui.cfg 3.2.6p14
>>>
>>>
>>> # Resource Manager Definition
>>>
>>> RMCFG[JAKE] TYPE=PBS
>>>
>>>
>>> RMPOLLINTERVAL        00:00:30
>>>
>>> SERVERPORT            42559
>>> SERVERMODE            NORMAL
>>>
>>> # Admin: http://clusterresources.com/mauidocs/a.esecurity.html
>>>
>>>
>>> LOGDIR                /var/log/maui
>>> LOGFILE               maui.log
>>> LOGFILEMAXSIZE        100000000
>>> #LOGLEVEL              3
>>> LOGLEVEL              2
>>> LOGFILEROLLDEPTH      5
>>> STATDIR               /var/log/maui/stats
>>> SERVERHOMEDIR         /usr/maui/
>>> TOOLSDIR              /usr/maui/tools/
>>> LOGDIR                /var/log/maui/
>>> STATDIR               /usr/maui/stats/
>>> #LOCKFILE              /usr/maui/maui.pid
>>> SERVERCONFIGFILE      /usr/maui/maui.cfg
>>> CHECKPOINTFILE        /var/log/maui/maui.ck
>>>
>>> # Misc configs
>>>
>>> ENABLEMULTINODEJOBS     TRUE
>>> JOBMAXOVERRUN           00:01:00
>>> #SYSTEMDEFAULTJOBWALLTIME       1:00:00:00
>>> USEMACHINESPEED         ON
>>> #PREEMPTPOLICY          CHECKPOINT
>>> PREEMPTPOLICY           SUSPEND
>>> CREDWEIGHT             1
>>> CLASSWEIGHT            1
>>> QOSWEIGHT              1
>>> RESCTLPOLICY            ANY
>>>
>>> # Job Priority: http://clusterresources.com/mauidocs/
>>> 5.1jobprioritization.html
>>>
>>> QUEUETIMEWEIGHT       1
>>>
>>> # FairShare: http://clusterresources.com/mauidocs/6.3fairshare.html
>>>
>>> FSPOLICY              DEDICATEDPS
>>> FSDEPTH               7
>>> FSINTERVAL            86400
>>> FSDECAY               0.80
>>>
>>> # Throttling Policies: http://clusterresources.com/mauidocs/
>>> 6.2throttlingpolicies.html
>>>
>>> # NONE SPECIFIED
>>>
>>> # Backfill: http://clusterresources.com/mauidocs/8.2backfill.html
>>>
>>> BACKFILLPOLICY        BESTFIT
>>> RESERVATIONPOLICY     CURRENTHIGHEST
>>> #RESERVATIONPOLICY    NEVER
>>> RESERVATIONDEPTH      50
>>> RESDEPTH              32
>>>
>>> # Node Allocation: http://clusterresources.com/mauidocs/
>>> 5.2nodeallocation.html
>>>
>>> NODEACCESSPOLICY        SHARED
>>> #NODEALLOCATIONPOLICY   MINRESOURCE
>>> #NODEALLOCATIONPOLICY   MAXBALANCE
>>> NODEALLOCATIONPOLICY    FASTEST
>>> #NODEAVAILABILITYPOLICY         UTILIZED
>>> NODEAVAILABILITYPOLICY  COMBINED
>>> NODEMAXLOAD             1.0
>>> NODELOADPOLICY          ADJUSTSTATE
>>>
>>>
>>> # QOS: http://clusterresources.com/mauidocs/7.3qos.html
>>>
>>>
>>> QOSCFG[qm] PRIORITY=100 QFLAGS=PREEMPTEE
>>> QOSCFG[md] PRIORITY=100 QFLAGS=PREEMPTEE
>>> QOSCFG[faculty] PRIORITY=1000 QFLAGS=PREEMPTOR
>>> QOSFEATURES[qm] hamilton g03
>>> QOSFEATURES[md] hamilton
>>>
>>> # Standing Reservations: http://clusterresources.com/mauidocs/
>>> 7.1.3standingreservations.html
>>>
>>> # SRSTARTTIME[test] 8:00:00
>>> # SRENDTIME[test]   17:00:00
>>> # SRDAYS[test]      MON TUE WED THU FRI
>>> # SRTASKCOUNT[test] 20
>>> # SRMAXTIME[test]   0:30:00
>>>
>>> # Creds: http://clusterresources.com/mauidocs/
>>> 6.1fairnessoverview.html
>>>
>>> # USERCFG[DEFAULT]      FSTARGET=25.0
>>> # USERCFG[john]         PRIORITY=100  FSTARGET=10.0-
>>> # GROUPCFG[staff]       PRIORITY=1000 QLIST=hi:low QDEF=hi
>>> #
>>> # Groups
>>> #
>>> GROUPCFG[faculty]       PRIORITY=1000 QLIST=faculty QDEF=faculty
>>> GROUPCFG[hamilton]      PRIORITY=10
>>> GROUPCFG[users]         PRIORITY=10
>>> #
>>> # Classes (queue's)
>>> #
>>> #CLASSCFG[main]         QLIST=md:qm
>>> CLASSCFG[main]          QLIST=md:qm:mercury MAXPROC=32,64
>>> MAXMEM=32768,65536
>>> CLASSCFG[hamilton]      QLIST=md:qm
>>>
>>>
>>>
>>> torque config
>>> -------------------
>>>
>>> [root@ maui]# qmgr
>>> Max open servers: 4
>>> Qmgr: print server
>>> #
>>> # Create queues and set their attributes.
>>> #
>>> #
>>> # Create and define queue main
>>> #
>>> create queue main
>>> set queue main queue_type = Execution
>>> set queue main Priority = 100
>>> set queue main resources_default.neednodes = main
>>> set queue main resources_default.walltime = 24:00:00
>>> set queue main enabled = True
>>> set queue main started = True
>>> #
>>> # Create and define queue hamilton
>>> #
>>> create queue hamilton
>>> set queue hamilton queue_type = Execution
>>> set queue hamilton resources_default.neednodes = hamilton
>>> set queue hamilton resources_default.walltime = 24:00:00
>>> set queue hamilton enabled = True
>>> set queue hamilton started = True
>>> #
>>> # Set server attributes.
>>> #
>>> set server scheduling = True
>>> set server default_queue = main
>>> set server log_events = 511
>>> set server mail_from = adm
>>> set server query_other_jobs = True
>>> set server resources_default.ncpus = 1
>>> set server resources_default.walltime = 24:00:00
>>> set server scheduler_iteration = 60
>>> set server node_check_rate = 150
>>> set server tcp_timeout = 6
>>> set server job_nanny = True
>>> _______________________________________________
>>> mauiusers mailing list
>>> mauiusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>
>> _______________________________________________
>> mauiusers mailing list
>> mauiusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/mauiusers
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers

--
Bas van der Vlies
basv at sara.nl





More information about the mauiusers mailing list