[Mauiusers] performance issues with maui & torque

Denis denismpa at gmail.com
Wed Oct 17 12:26:27 MDT 2012


2012/10/17 Ian Miller <ianm at uchicago.edu>:
> Hi
> I have maui verison 3.3.1 and touque version 2.5.7
> and I seem to have a few nodes sitting idle that should be running jobs.
> They have been able to run jobs in the past but the cluster has never run at
> 80-90%
> The output of showq is as follows (I omitted the jobs lists)
>
> 119 Active Jobs     130 of  344 Processors Active (37.79%)
>
>                         15 of   35 Nodes Active      (42.86%)
>
> Total Jobs: 467   Active Jobs: 119   Idle Jobs: 0   Blocked Jobs: 348
>
> When I try to force run a job.. I get ….
>
> root at beast$ qrun 209054
>
> qrun: Execution server rejected request MSG=cannot send job to mom,
> state=PRERUN 209054.beast-net
>
> 30 out of the 34 worker nodes at in one queue (batch) with 2 out of the 30
> shared between another queue.  Currently 33 of the total jobs (467) are in
> a different queue (short) and are running fine, the reset are in the
> default(batch).  My question is how can I get the idle nodes to run this
> jobs?
>
> What might be the problem?
>
Try restarting the mom services at the empty nodes.
>
>
> Qmgr: print queue batch
>
> # Create queues and set their attributes.
>
> #
>
> #
>
> # Create and define queue batch
>
> #
>
> create queue batch
>
> set queue batch queue_type = Execution
>
> set queue batch max_running = 200
>
> set queue batch resources_default.neednodes = batch
>
> set queue batch resources_default.nodes = 1
>
> set queue batch max_user_run = 150
>
> set queue batch keep_completed = 300
>
> set queue batch enabled = True
>
> set queue batch started = True
>
>
> # maui.cfg 3.3.1
>
> SERVERHOST            beast
>
> # primary admin must be first in list
>
> ADMIN1                root
>
> # Resource Manager Definition
>
> RMCFG[BEAST] TYPE=PBS
>
> # Allocation Manager Definition
>
> AMCFG[bank]  TYPE=NONE
>
> # full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
>
> # use the 'schedctl -l' command to display current configuration
>
> RMPOLLINTERVAL        00:00:30
>
> SERVERPORT            42559
>
> SERVERMODE            NORMAL
>
> # Admin: http://supercluster.org/mauidocs/a.esecurity.html
>
> LOGFILE               maui.log
>
> LOGFILEMAXSIZE        10000000
>
> LOGLEVEL              3
>
> # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
>
> QUEUETIMEWEIGHT       1
>
> # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
>
> #FSPOLICY              PSDEDICATED
>
> #FSDEPTH               7
>
> #FSINTERVAL            86400
>
> #FSDECAY               0.80
>
> # Throttling Policies:
> http://supercluster.org/mauidocs/6.2throttlingpolicies.html
>
> # NONE SPECIFIED
>
> # Backfill: http://supercluster.org/mauidocs/8.2backfill.html
>
> BACKFILLPOLICY        FIRSTFIT
> RESERVATIONPOLICY     CURRENTHIGHEST
>
> # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
>
> NODEALLOCATIONPOLICY PRIORITY
> NODECFG[DEFAULT] PRIORITYF='0.01*AMEM - 2*LOAD'
> NODEAVAILABILITYPOLICY COMBINED:MEM
>
> SRCFG[Reinitz] HOSTLIST=minion1[2-9]
> SRCFG[Reinitz] GROUPLIST=Reinitz
>
> # QOS: http://supercluster.org/mauidocs/7.3qos.html
>
> # QOSCFG[hi]  PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
> # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
>
> # Standing Reservations:
> http://supercluster.org/mauidocs/7.1.3standingreservations.html
>
> # SRSTARTTIME[test] 8:00:00
> # SRENDTIME[test]   17:00:00
> # SRDAYS[test]      MON TUE WED THU FRI
> # SRTASKCOUNT[test] 20
> # SRMAXTIME[test]   0:30:00
>
> # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
>
> USERCFG[DEFAULT]        MAXIJOB=2000
> # USERCFG[DEFAULT]      FSTARGET=25.0
> # USERCFG[john]         PRIORITY=100  FSTARGET=10.0-
> # GROUPCFG[staff]       PRIORITY=1000 QLIST=hi:low QDEF=hi
> # CLASSCFG[batch]       FLAGS=PREEMPTEE
> # CLASSCFG[interactive] FLAGS=PREEMPTOR
>
>
>
>
>
>
>
>
> Ian Miller
> Research Computing Administrator
> ianm at uchicago.edu
> (312) 402-6170
>
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>



-- 
Denis Anjos,
www.versatushpc.com.br


More information about the mauiusers mailing list