[Mauiusers] performance issues with maui & torque

Denis denismpa at gmail.com
Wed Oct 17 15:54:18 MDT 2012


2012/10/17 Ian Miller <ianm at uchicago.edu>:
> Thx
> That was the fix.
>
> Ian Miller
> Research Computing Administrator
> ianm at uchicago.edu
> (312) 402-6170
>
You're very welcome.
D.
>
>
>
>
>
> On 10/17/12 1:26 PM, "Denis" <denismpa at gmail.com> wrote:
>
>>2012/10/17 Ian Miller <ianm at uchicago.edu>:
>>> Hi
>>> I have maui verison 3.3.1 and touque version 2.5.7
>>> and I seem to have a few nodes sitting idle that should be running jobs.
>>> They have been able to run jobs in the past but the cluster has never
>>>run at
>>> 80-90%
>>> The output of showq is as follows (I omitted the jobs lists)
>>>
>>> 119 Active Jobs     130 of  344 Processors Active (37.79%)
>>>
>>>                         15 of   35 Nodes Active      (42.86%)
>>>
>>> Total Jobs: 467   Active Jobs: 119   Idle Jobs: 0   Blocked Jobs: 348
>>>
>>> When I try to force run a job.. I get Š.
>>>
>>> root at beast$ qrun 209054
>>>
>>> qrun: Execution server rejected request MSG=cannot send job to mom,
>>> state=PRERUN 209054.beast-net
>>>
>>> 30 out of the 34 worker nodes at in one queue (batch) with 2 out of the
>>>30
>>> shared between another queue.  Currently 33 of the total jobs (467) are
>>>in
>>> a different queue (short) and are running fine, the reset are in the
>>> default(batch).  My question is how can I get the idle nodes to run this
>>> jobs?
>>>
>>> What might be the problem?
>>>
>>Try restarting the mom services at the empty nodes.
>>>
>>>
>>> Qmgr: print queue batch
>>>
>>> # Create queues and set their attributes.
>>>
>>> #
>>>
>>> #
>>>
>>> # Create and define queue batch
>>>
>>> #
>>>
>>> create queue batch
>>>
>>> set queue batch queue_type = Execution
>>>
>>> set queue batch max_running = 200
>>>
>>> set queue batch resources_default.neednodes = batch
>>>
>>> set queue batch resources_default.nodes = 1
>>>
>>> set queue batch max_user_run = 150
>>>
>>> set queue batch keep_completed = 300
>>>
>>> set queue batch enabled = True
>>>
>>> set queue batch started = True
>>>
>>>
>>> # maui.cfg 3.3.1
>>>
>>> SERVERHOST            beast
>>>
>>> # primary admin must be first in list
>>>
>>> ADMIN1                root
>>>
>>> # Resource Manager Definition
>>>
>>> RMCFG[BEAST] TYPE=PBS
>>>
>>> # Allocation Manager Definition
>>>
>>> AMCFG[bank]  TYPE=NONE
>>>
>>> # full parameter docs at
>>>http://supercluster.org/mauidocs/a.fparameters.html
>>>
>>> # use the 'schedctl -l' command to display current configuration
>>>
>>> RMPOLLINTERVAL        00:00:30
>>>
>>> SERVERPORT            42559
>>>
>>> SERVERMODE            NORMAL
>>>
>>> # Admin: http://supercluster.org/mauidocs/a.esecurity.html
>>>
>>> LOGFILE               maui.log
>>>
>>> LOGFILEMAXSIZE        10000000
>>>
>>> LOGLEVEL              3
>>>
>>> # Job Priority:
>>>http://supercluster.org/mauidocs/5.1jobprioritization.html
>>>
>>> QUEUETIMEWEIGHT       1
>>>
>>> # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
>>>
>>> #FSPOLICY              PSDEDICATED
>>>
>>> #FSDEPTH               7
>>>
>>> #FSINTERVAL            86400
>>>
>>> #FSDECAY               0.80
>>>
>>> # Throttling Policies:
>>> http://supercluster.org/mauidocs/6.2throttlingpolicies.html
>>>
>>> # NONE SPECIFIED
>>>
>>> # Backfill: http://supercluster.org/mauidocs/8.2backfill.html
>>>
>>> BACKFILLPOLICY        FIRSTFIT
>>> RESERVATIONPOLICY     CURRENTHIGHEST
>>>
>>> # Node Allocation:
>>>http://supercluster.org/mauidocs/5.2nodeallocation.html
>>>
>>> NODEALLOCATIONPOLICY PRIORITY
>>> NODECFG[DEFAULT] PRIORITYF='0.01*AMEM - 2*LOAD'
>>> NODEAVAILABILITYPOLICY COMBINED:MEM
>>>
>>> SRCFG[Reinitz] HOSTLIST=minion1[2-9]
>>> SRCFG[Reinitz] GROUPLIST=Reinitz
>>>
>>> # QOS: http://supercluster.org/mauidocs/7.3qos.html
>>>
>>> # QOSCFG[hi]  PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
>>> # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
>>>
>>> # Standing Reservations:
>>> http://supercluster.org/mauidocs/7.1.3standingreservations.html
>>>
>>> # SRSTARTTIME[test] 8:00:00
>>> # SRENDTIME[test]   17:00:00
>>> # SRDAYS[test]      MON TUE WED THU FRI
>>> # SRTASKCOUNT[test] 20
>>> # SRMAXTIME[test]   0:30:00
>>>
>>> # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
>>>
>>> USERCFG[DEFAULT]        MAXIJOB=2000
>>> # USERCFG[DEFAULT]      FSTARGET=25.0
>>> # USERCFG[john]         PRIORITY=100  FSTARGET=10.0-
>>> # GROUPCFG[staff]       PRIORITY=1000 QLIST=hi:low QDEF=hi
>>> # CLASSCFG[batch]       FLAGS=PREEMPTEE
>>> # CLASSCFG[interactive] FLAGS=PREEMPTOR
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Ian Miller
>>> Research Computing Administrator
>>> ianm at uchicago.edu
>>> (312) 402-6170
>>>
>>>
>>> _______________________________________________
>>> mauiusers mailing list
>>> mauiusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>>
>>
>>
>>
>>--
>>Denis Anjos,
>>www.versatushpc.com.br
>



-- 
Denis Anjos,
www.versatushpc.com.br


More information about the mauiusers mailing list