[Mauiusers] performance issues with maui & torque
Denis
denismpa at gmail.com
Wed Oct 17 15:54:18 MDT 2012
2012/10/17 Ian Miller <ianm at uchicago.edu>:
> Thx
> That was the fix.
>
> Ian Miller
> Research Computing Administrator
> ianm at uchicago.edu
> (312) 402-6170
>
You're very welcome.
D.
>
>
>
>
>
> On 10/17/12 1:26 PM, "Denis" <denismpa at gmail.com> wrote:
>
>>2012/10/17 Ian Miller <ianm at uchicago.edu>:
>>> Hi
>>> I have maui verison 3.3.1 and touque version 2.5.7
>>> and I seem to have a few nodes sitting idle that should be running jobs.
>>> They have been able to run jobs in the past but the cluster has never
>>>run at
>>> 80-90%
>>> The output of showq is as follows (I omitted the jobs lists)
>>>
>>> 119 Active Jobs 130 of 344 Processors Active (37.79%)
>>>
>>> 15 of 35 Nodes Active (42.86%)
>>>
>>> Total Jobs: 467 Active Jobs: 119 Idle Jobs: 0 Blocked Jobs: 348
>>>
>>> When I try to force run a job.. I get Š.
>>>
>>> root at beast$ qrun 209054
>>>
>>> qrun: Execution server rejected request MSG=cannot send job to mom,
>>> state=PRERUN 209054.beast-net
>>>
>>> 30 out of the 34 worker nodes at in one queue (batch) with 2 out of the
>>>30
>>> shared between another queue. Currently 33 of the total jobs (467) are
>>>in
>>> a different queue (short) and are running fine, the reset are in the
>>> default(batch). My question is how can I get the idle nodes to run this
>>> jobs?
>>>
>>> What might be the problem?
>>>
>>Try restarting the mom services at the empty nodes.
>>>
>>>
>>> Qmgr: print queue batch
>>>
>>> # Create queues and set their attributes.
>>>
>>> #
>>>
>>> #
>>>
>>> # Create and define queue batch
>>>
>>> #
>>>
>>> create queue batch
>>>
>>> set queue batch queue_type = Execution
>>>
>>> set queue batch max_running = 200
>>>
>>> set queue batch resources_default.neednodes = batch
>>>
>>> set queue batch resources_default.nodes = 1
>>>
>>> set queue batch max_user_run = 150
>>>
>>> set queue batch keep_completed = 300
>>>
>>> set queue batch enabled = True
>>>
>>> set queue batch started = True
>>>
>>>
>>> # maui.cfg 3.3.1
>>>
>>> SERVERHOST beast
>>>
>>> # primary admin must be first in list
>>>
>>> ADMIN1 root
>>>
>>> # Resource Manager Definition
>>>
>>> RMCFG[BEAST] TYPE=PBS
>>>
>>> # Allocation Manager Definition
>>>
>>> AMCFG[bank] TYPE=NONE
>>>
>>> # full parameter docs at
>>>http://supercluster.org/mauidocs/a.fparameters.html
>>>
>>> # use the 'schedctl -l' command to display current configuration
>>>
>>> RMPOLLINTERVAL 00:00:30
>>>
>>> SERVERPORT 42559
>>>
>>> SERVERMODE NORMAL
>>>
>>> # Admin: http://supercluster.org/mauidocs/a.esecurity.html
>>>
>>> LOGFILE maui.log
>>>
>>> LOGFILEMAXSIZE 10000000
>>>
>>> LOGLEVEL 3
>>>
>>> # Job Priority:
>>>http://supercluster.org/mauidocs/5.1jobprioritization.html
>>>
>>> QUEUETIMEWEIGHT 1
>>>
>>> # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
>>>
>>> #FSPOLICY PSDEDICATED
>>>
>>> #FSDEPTH 7
>>>
>>> #FSINTERVAL 86400
>>>
>>> #FSDECAY 0.80
>>>
>>> # Throttling Policies:
>>> http://supercluster.org/mauidocs/6.2throttlingpolicies.html
>>>
>>> # NONE SPECIFIED
>>>
>>> # Backfill: http://supercluster.org/mauidocs/8.2backfill.html
>>>
>>> BACKFILLPOLICY FIRSTFIT
>>> RESERVATIONPOLICY CURRENTHIGHEST
>>>
>>> # Node Allocation:
>>>http://supercluster.org/mauidocs/5.2nodeallocation.html
>>>
>>> NODEALLOCATIONPOLICY PRIORITY
>>> NODECFG[DEFAULT] PRIORITYF='0.01*AMEM - 2*LOAD'
>>> NODEAVAILABILITYPOLICY COMBINED:MEM
>>>
>>> SRCFG[Reinitz] HOSTLIST=minion1[2-9]
>>> SRCFG[Reinitz] GROUPLIST=Reinitz
>>>
>>> # QOS: http://supercluster.org/mauidocs/7.3qos.html
>>>
>>> # QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
>>> # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
>>>
>>> # Standing Reservations:
>>> http://supercluster.org/mauidocs/7.1.3standingreservations.html
>>>
>>> # SRSTARTTIME[test] 8:00:00
>>> # SRENDTIME[test] 17:00:00
>>> # SRDAYS[test] MON TUE WED THU FRI
>>> # SRTASKCOUNT[test] 20
>>> # SRMAXTIME[test] 0:30:00
>>>
>>> # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
>>>
>>> USERCFG[DEFAULT] MAXIJOB=2000
>>> # USERCFG[DEFAULT] FSTARGET=25.0
>>> # USERCFG[john] PRIORITY=100 FSTARGET=10.0-
>>> # GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi
>>> # CLASSCFG[batch] FLAGS=PREEMPTEE
>>> # CLASSCFG[interactive] FLAGS=PREEMPTOR
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Ian Miller
>>> Research Computing Administrator
>>> ianm at uchicago.edu
>>> (312) 402-6170
>>>
>>>
>>> _______________________________________________
>>> mauiusers mailing list
>>> mauiusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>>
>>
>>
>>
>>--
>>Denis Anjos,
>>www.versatushpc.com.br
>
--
Denis Anjos,
www.versatushpc.com.br
More information about the mauiusers
mailing list