[Mauiusers] Preemption not working : job is deferred. Reason: RMFailure (cannot start job - RM failure, rc: 15044, msg: 'Resource temporarily unavailable MSG=job allocation request exceeds currently available cluster nodes, 1 requested, 0 availab
Andre Gauthier
andre.gauthier at gmail.com
Thu Apr 22 14:53:56 MDT 2010
The following worked for me
DEFERTIME 00:00:05
BUt I was told JOBAGGREGATIONTIME something higher than the default
would fix it too? I haven't tried that yet. Right now its 00:00:00
On Tue, Apr 20, 2010 at 4:45 PM, Tom Rudwick <trudwickiii at apple.com> wrote:
> Hi Andre,
>
> We have preemption working at our site on that version of maui.
>
> We have found that the settings below seem to be necessary for
> it to work at our site. I don't see a SYSCFG in your config,
> and I don't see a GROUPCFG for the admins group? I may be off
> base on these, since I know some bugs have been fixed since we
> got this working, but you may want to try setting those.
>
> On this line you set the "sys" QOS but I don't see it elsewhere...
>
> CLASSCFG[admins] MAXPROC=280 QDEF=sys PRIORITY=2001
>
> I see this "admins" one...
>
> QOSCFG[admins] QFLAGS=PREEMPTOR PRIORITY=1000
>
> Good luck,
>
> Tom
>
> ( this is a fragment of our maui config file ...)
>
> QOSWEIGHT 1
> SYSCFG QLIST=bigmem,integration,interactive,debug,regress,contingent
> QOSCFG[bigmem] PRIORITY=1 QFLAGS=PREEMPTOR,RESTARTPREEMPT
> QOSCFG[integration] PRIORITY=1 QFLAGS=USERESERVED
> QOSCFG[interactive] PRIORITY=2 QFLAGS=PREEMPTOR,RESTARTPREEMPT
> QOSCFG[debug] PRIORITY=1
> QOSCFG[regress] PRIORITY=-1
> QOSCFG[contingent] PRIORITY=-2 QFLAGS=PREEMPTEE
> GROUPCFG[users] QDEF=DEFAULT
> QLIST=bigmem,integration,interactive,debug,regress,contingent
> CLASSCFG[regress] QDEF=contingent
>
>
>
> Andre Gauthier wrote:
>>
>> HI, I'm trying to get preemption to work with Maui and Torque. I
>> have dozen queues, but one is define as a preemptee (general queue &
>> qos) and another as a preemptor (admins queue & qos). I submit a job
>> to the queue that is a premptee then a job to the preemptor. The
>> preemptor does not run. Maui version 3.2.6p21, Torque Version
>> 2.3.6-1.
>>
>> qstat:
>>
>> Job id Name User Time Use S
>> Queue
>> ------------------------- ---------------- --------------- -------- -
>> -----
>> 459.hpc-test sleep.sh user2 00:00:00 R
>> general
>> 460.hpc-test sleep.sh user1 0 Q
>> admins
>>
>>
>> checkjob 460:
>>
>> checking job 460
>>
>> State: Idle EState: Deferred
>> Creds: user:user1 group:admins class:admins qos:admins
>> WallTime: 00:00:00 of 1:00:00
>> SubmitTime: Tue Apr 20 11:41:28
>> (Time Queued Total: 00:00:02 Eligible: 00:00:01)
>>
>> StartDate: 00:00:00 Tue Apr 20 11:41:30
>> Total Tasks: 8
>>
>> Req[0] TaskCount: 8 Partition: ALL
>> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
>> Opsys: [NONE] Arch: [NONE] Features: [NONE]
>> Dedicated Resources Per Task: PROCS: 1 MEM: 32M
>>
>>
>> IWD: [NONE] Executable: [NONE]
>> Bypass: 0 StartCount: 1
>> PartitionMask: [ALL]
>> Flags: RESTARTABLE PREEMPTOR
>>
>> job is deferred. Reason: RMFailure (cannot start job - RM failure,
>> rc: 15044, msg: 'Resource temporarily unavailable MSG=job allocation
>> request exceeds currently available cluster nodes, 1 requested, 0
>> available')
>> Holds: Defer (hold reason: RMFailure)
>> PE: 8.00 StartPriority: 3001
>> cannot select job 460 for partition DEFAULT (job hold active)
>>
>>
>> checkjob 459:
>>
>> checking job 459
>>
>> State: Running
>> Creds: user:user2 group:user2 class:general qos:general
>> WallTime: 00:03:05 of 1:00:00
>> SubmitTime: Tue Apr 20 11:41:11
>> (Time Queued Total: 00:00:19 Eligible: 00:00:01)
>>
>> StartTime: Tue Apr 20 11:41:30
>> Total Tasks: 96
>>
>> Req[0] TaskCount: 96 Partition: DEFAULT
>> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
>> Opsys: [NONE] Arch: [NONE] Features: [NONE]
>> Dedicated Resources Per Task: PROCS: 1 MEM: 2M
>> Allocated Nodes:
>> [compute-0-15:8][compute-0-13:8][compute-0-12:8][compute-0-11:8]
>> [compute-0-10:8][compute-0-9:8][compute-0-8:8][compute-0-7:8]
>> [compute-0-6:8][compute-0-5:8][compute-0-4:8][compute-0-3:8]
>>
>>
>>
>> IWD: [NONE] Executable: [NONE]
>> Bypass: 0 StartCount: 2
>> PartitionMask: [ALL]
>> Flags: RESTARTABLE PREEMPTEE
>> Attr: PREEMPTEE
>>
>> Reservation '459' (-00:03:06 -> 00:56:54 Duration: 1:00:00)
>> PE: 96.00 StartPriority: 200
>>
>>
>>
>>
>>
>> showconfig:
>>
>>
>>
>> IWD: [NONE] Executable: [NONE]
>> Bypass: 0 StartCount: 2
>> PartitionMask: [ALL]
>> Flags: RESTARTABLE PREEMPTEE
>> Attr: PREEMPTEE
>>
>> Reservation '459' (-00:03:06 -> 00:56:54 Duration: 1:00:00)
>> PE: 96.00 StartPriority: 200
>>
>> [root at hpc-test maui]# showconfig
>> # Maui version 3.2.6p21 (PID: 16046)
>> # global policies
>>
>> REJECTNEGPRIOJOBS[0] FALSE
>> ENABLENEGJOBPRIORITY[0] FALSE
>> ENABLEMULTINODEJOBS[0] TRUE
>> ENABLEMULTIREQJOBS[0] FALSE
>> BFPRIORITYPOLICY[0] [NONE]
>> JOBPRIOACCRUALPOLICY QUEUEPOLICY
>> NODELOADPOLICY ADJUSTSTATE
>> USEMACHINESPEED FALSE
>> USESYSTEMQUEUETIME TRUE
>> USELOCALMACHINEPRIORITY FALSE
>> NODEUNTRACKEDLOADFACTOR 1.2
>> JOBNODEMATCHPOLICY[0]
>>
>> JOBMAXSTARTTIME[0] INFINITY
>>
>> METAMAXTASKS[0] 0
>> NODESETPOLICY[0] [NONE]
>> NODESETATTRIBUTE[0] [NONE]
>> NODESETLIST[0]
>> NODESETDELAY[0] 00:00:00
>> NODESETPRIORITYTYPE[0] MINLOSS
>> NODESETTOLERANCE[0] 0.00
>>
>> BACKFILLPOLICY[0] FIRSTFIT
>> BACKFILLDEPTH[0] 0
>> BACKFILLPROCFACTOR[0] 0
>> BACKFILLMAXSCHEDULES[0] 10000
>> BACKFILLMETRIC[0] PROCS
>>
>> BFCHUNKDURATION[0] 00:00:00
>> BFCHUNKSIZE[0] 0
>> PREEMPTPOLICY[0] REQUEUE
>> MINADMINSTIME[0] 00:00:00
>> RESOURCELIMITPOLICY[0]
>> NODEAVAILABILITYPOLICY[0] COMBINED:[DEFAULT]
>> NODEALLOCATIONPOLICY[0] MINRESOURCE
>> TASKDISTRIBUTIONPOLICY[0] DEFAULT
>> RESERVATIONPOLICY[0] NEVER
>> RESERVATIONRETRYTIME[0] 00:00:00
>> RESERVATIONTHRESHOLDTYPE[0] NONE
>> RESERVATIONTHRESHOLDVALUE[0] 0
>>
>> FSPOLICY [NONE]
>> FSPOLICY [NONE]
>> FSINTERVAL 12:00:00
>> FSDEPTH 8
>> FSDECAY 1.00
>>
>>
>>
>> # Priority Weights
>>
>> SERVICEWEIGHT[0] 1
>> TARGETWEIGHT[0] 1
>> CREDWEIGHT[0] 1
>> ATTRWEIGHT[0] 1
>> FSWEIGHT[0] 1
>> RESWEIGHT[0] 1
>> USAGEWEIGHT[0] 1
>> QUEUETIMEWEIGHT[0] 1
>> XFACTORWEIGHT[0] 0
>> SPVIOLATIONWEIGHT[0] 0
>> BYPASSWEIGHT[0] 0
>> TARGETQUEUETIMEWEIGHT[0] 0
>> TARGETXFACTORWEIGHT[0] 0
>> USERWEIGHT[0] 1
>> GROUPWEIGHT[0] 1
>> ACCOUNTWEIGHT[0] 0
>> QOSWEIGHT[0] 1
>> CLASSWEIGHT[0] 1
>> FSUSERWEIGHT[0] 0
>> FSGROUPWEIGHT[0] 0
>> FSACCOUNTWEIGHT[0] 0
>> FSQOSWEIGHT[0] 0
>> FSCLASSWEIGHT[0] 0
>> ATTRATTRWEIGHT[0] 0
>> ATTRSTATEWEIGHT[0] 0
>> NODEWEIGHT[0] 0
>> PROCWEIGHT[0] 0
>> MEMWEIGHT[0] 0
>> SWAPWEIGHT[0] 0
>> DISKWEIGHT[0] 0
>> PSWEIGHT[0] 0
>> PEWEIGHT[0] 0
>> WALLTIMEWEIGHT[0] 0
>> UPROCWEIGHT[0] 0
>> UJOBWEIGHT[0] 0
>> CONSUMEDWEIGHT[0] 0
>> USAGEEXECUTIONTIMEWEIGHT[0] 0
>> REMAININGWEIGHT[0] 0
>> PERCENTWEIGHT[0] 0
>> XFMINWCLIMIT[0] 00:02:00
>>
>>
>> # partition DEFAULT policies
>>
>> REJECTNEGPRIOJOBS[1] FALSE
>> ENABLENEGJOBPRIORITY[1] FALSE
>> ENABLEMULTINODEJOBS[1] TRUE
>> ENABLEMULTIREQJOBS[1] FALSE
>> BFPRIORITYPOLICY[1] [NONE]
>> JOBPRIOACCRUALPOLICY QUEUEPOLICY
>> NODELOADPOLICY ADJUSTSTATE
>> JOBNODEMATCHPOLICY[1]
>>
>> JOBMAXSTARTTIME[1] INFINITY
>>
>> METAMAXTASKS[1] 0
>> NODESETPOLICY[1] [NONE]
>> NODESETATTRIBUTE[1] [NONE]
>> NODESETLIST[1]
>> NODESETDELAY[1] 00:00:00
>> NODESETPRIORITYTYPE[1] MINLOSS
>> NODESETTOLERANCE[1] 0.00
>>
>> # Priority Weights
>>
>> XFMINWCLIMIT[1] 00:00:00
>>
>> RMAUTHTYPE[0] CHECKSUM
>>
>> CLASSCFG[[NONE]] DEFAULT.FEATURES=[NONE]
>> CLASSCFG[[ALL]] DEFAULT.FEATURES=[NONE]
>> CLASSCFG[DEFAULT] DEFAULT.FEATURES=[NONE]
>> CLASSCFG[batch] DEFAULT.FEATURES=[NONE]
>> CLASSCFG[interactive] DEFAULT.FEATURES=[NONE]
>> CLASSCFG[general] DEFAULT.FEATURES=[NONE]
>> CLASSCFG[priya] DEFAULT.FEATURES=[NONE]
>> CLASSCFG[admins] DEFAULT.FEATURES=[NONE]
>> CLASSCFG[sohrab] DEFAULT.FEATURES=[NONE]
>> CLASSCFG[micro] DEFAULT.FEATURES=[NONE]
>> CLASSCFG[altonji] DEFAULT.FEATURES=[NONE]
>> CLASSCFG[easther] DEFAULT.FEATURES=[NONE]
>> CLASSCFG[berry] DEFAULT.FEATURES=[NONE]
>> CLASSCFG[hpcprog] DEFAULT.FEATURES=[NONE]
>> CLASSCFG[macro] DEFAULT.FEATURES=[NONE]
>> QOSPRIORITY[0] 0
>> QOSQTWEIGHT[0] 0
>> QOSXFWEIGHT[0] 0
>> QOSTARGETXF[0] 0.00
>> QOSTARGETQT[0] 00:00:00
>> QOSFLAGS[0]
>> QOSPRIORITY[1] 0
>> QOSQTWEIGHT[1] 0
>> QOSXFWEIGHT[1] 0
>> QOSTARGETXF[1] 0.00
>> QOSTARGETQT[1] 00:00:00
>> QOSFLAGS[1]
>> QOSPRIORITY[2] 100
>> QOSQTWEIGHT[2] 0
>> QOSXFWEIGHT[2] 0
>> QOSTARGETXF[2] 100.00
>> QOSTARGETQT[2] 00:00:00
>> QOSFLAGS[2]
>> QOSPRIORITY[3] -1000
>> QOSQTWEIGHT[3] 0
>> QOSXFWEIGHT[3] 0
>> QOSTARGETXF[3] 0.00
>> QOSTARGETQT[3] 00:00:00
>> QOSFLAGS[3]
>> QOSPRIORITY[4] 1000
>> QOSQTWEIGHT[4] 0
>> QOSXFWEIGHT[4] 0
>> QOSTARGETXF[4] 0.00
>> QOSTARGETQT[4] 00:00:00
>> QOSFLAGS[4] PREEMPTOR
>> QOSPRIORITY[5] 100
>> QOSQTWEIGHT[5] 0
>> QOSXFWEIGHT[5] 0
>> QOSTARGETXF[5] 0.00
>> QOSTARGETQT[5] 00:00:00
>> QOSFLAGS[5] PREEMPTEE
>> # SERVER MODULES: MX
>> SERVERMODE NORMAL
>> SERVERNAME
>> SERVERHOST hpc-test.wss.yale.edu
>> SERVERPORT 42559
>> LOGFILE maui.log
>> LOGFILEMAXSIZE 10000000
>> LOGFILEROLLDEPTH 1
>> LOGLEVEL 3
>> LOGFACILITY fALL
>> SERVERHOMEDIR /opt/maui/
>> TOOLSDIR /opt/maui/tools/
>> LOGDIR /opt/maui/log/
>> STATDIR /opt/maui/stats/
>> LOCKFILE /opt/maui/maui.pid
>> SERVERCONFIGFILE /opt/maui/maui.cfg
>> CHECKPOINTFILE /opt/maui/maui.ck
>> CHECKPOINTINTERVAL 00:05:00
>> CHECKPOINTEXPIRATIONTIME 3:11:20:00
>> TRAPJOB
>> TRAPNODE
>> TRAPFUNCTION
>> RESDEPTH 24
>>
>> RMPOLLINTERVAL 00:00:30
>> NODEACCESSPOLICY SHARED
>> ALLOCLOCALITYPOLICY [NONE]
>> SIMTIMEPOLICY [NONE]
>> ADMIN1 maui root
>> ADMINHOSTS ALL
>> NODEPOLLFREQUENCY 0
>> DISPLAYFLAGS
>> DEFAULTDOMAIN
>> DEFAULTCLASSLIST [DEFAULT:1]
>> FEATURENODETYPEHEADER
>> FEATUREPROCSPEEDHEADER
>> FEATUREPARTITIONHEADER
>> DEFERTIME 1:00:00
>> DEFERCOUNT 24
>> DEFERSTARTCOUNT 1
>> JOBPURGETIME 0
>> NODEPURGETIME 2140000000
>> APIFAILURETHRESHHOLD 6
>> NODESYNCTIME 600
>> JOBSYNCTIME 600
>> JOBMAXOVERRUN 00:10:00
>> NODEMAXLOAD 0.0
>>
>> PLOTMINTIME 120
>> PLOTMAXTIME 245760
>> PLOTTIMESCALE 11
>> PLOTMINPROC 1
>> PLOTMAXPROC 512
>> PLOTPROCSCALE 9
>> SCHEDCFG[] MODE=NORMAL
>> SERVER=hpc-test.wss.yale.edu:42559
>> # RM MODULES: PBS SSS WIKI NATIVE
>> RMCFG[base] AUTHTYPE=CHECKSUM EPORT=15004 TIMEOUT=00:01:30 TYPE=PBS
>> SIMWORKLOADTRACEFILE workload
>> SIMRESOURCETRACEFILE resource
>> SIMAUTOSHUTDOWN OFF
>> SIMSTARTTIME 0
>> SIMSCALEJOBRUNTIME FALSE
>> SIMFLAGS
>> SIMJOBSUBMISSIONPOLICY CONSTANTJOBDEPTH
>> SIMINITIALQUEUEDEPTH 16
>> SIMWCACCURACY 0.00
>> SIMWCACCURACYCHANGE 0.00
>> SIMNODECOUNT 0
>> SIMNODECONFIGURATION NORMAL
>> SIMWCSCALINGPERCENT 100
>> SIMCOMRATE 0.10
>> SIMCOMTYPE ROUNDROBIN
>> COMINTRAFRAMECOST 0.30
>> COMINTERFRAMECOST 0.30
>> SIMSTOPITERATION -1
>> SIMEXITITERATION -1
>>
>>
>>
>> cat maui.cfg:
>>
>>
>> # maui.cfg.tmpl for Maui v3.2.5
>>
>> # full parameter docs at
>> http://supercluster.org/mauidocs/a.fparameters.html
>> # use the 'schedctl -l' command to display current configuration
>>
>> RMPOLLINTERVAL 00:00:30
>>
>> SERVERHOST hpc-test.wss.yale.edu
>> SERVERPORT 42559
>> SERVERMODE NORMAL
>>
>> RMCFG[base] TYPE=PBS TIMEOUT=90
>>
>> # Admin: http://supercluster.org/mauidocs/a.esecurity.html
>> # ADMIN1 users have full scheduler control
>>
>> ADMIN1 maui root
>>
>> LOGFILE maui.log
>> LOGFILEMAXSIZE 10000000
>> LOGLEVEL 3
>>
>> # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
>>
>> QUEUETIMEWEIGHT 1
>>
>> # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
>>
>> #FSPOLICY PSDEDICATED
>> #FSDEPTH 7
>> #FSINTERVAL 86400
>> #FSDECAY 0.80
>>
>> # Throttling Policies:
>> http://supercluster.org/mauidocs/6.2throttlingpolicies.html
>>
>> # NONE SPECIFIED
>>
>> # Backfill: http://supercluster.org/mauidocs/8.2backfill.html
>>
>> BACKFILLPOLICY FIRSTFIT
>> RESERVATIONPOLICY NEVER # set to never for premption.
>>
>> # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
>>
>> NODEALLOCATIONPOLICY MINRESOURCE
>>
>> # QOS: http://supercluster.org/mauidocs/7.3qos.html
>>
>> QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
>> QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
>>
>> # Standing Reservations:
>> http://supercluster.org/mauidocs/7.1.3standingreservations.html
>>
>> # SRSTARTTIME[test] 8:00:00
>> # SRENDTIME[test] 17:00:00
>> # SRDAYS[test] MON TUE WED THU FRI
>> # SRTASKCOUNT[test] 20
>> # SRMAXTIME[test] 0:30:00
>>
>> #PREEMPTPOLICY set by AG
>> PREEMPTIONPOLICY REQUEUE
>>
>> # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
>>
>> USERCFG[DEFAULT] FSTARGET=25.0
>> USERCFG[john] PRIORITY=100 FSTARGET=10.0-
>> GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi
>> CLASSCFG[batch] FLAGS=PREEMPTEE
>> CLASSCFG[interactive] FLAGS=PREEMPTOR
>>
>> ###set QOS needed for premptions
>> QOSWEIGHT 1
>> QOSCFG[admins] QFLAGS=PREEMPTOR PRIORITY=1000
>> QOSCFG[general] QFLAGS=PREEMPTEE PRIORITY=100
>>
>> GROUPWEIGHT 1
>> CLASSWEIGHT 1
>> CREDWEIGHT 1
>> USERWEIGHT 1
>>
>>
>> CLASSCFG[general] QDEF=general PRIORITY=100
>>
>> GROUPWEIGHT 1
>> CLASSCFG[DEFAULT] MAXPROC=280 QDEF=general PRIORITY=200
>> CLASSCFG[admins] MAXPROC=280 QDEF=sys PRIORITY=2001
>> _______________________________________________
>> mauiusers mailing list
>> mauiusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>
>>
>
>
More information about the mauiusers
mailing list