[Mauiusers] Problems with maui scalability
Ronny T. Lampert
telecaadmin at gmail.com
Thu Aug 16 02:23:00 MDT 2007
Hi,
yesterday maui was behaving in the worst way possible.
I had around 32K jobs queued (for 7 np=2 nodes).
I know that maui is only considering the first 3-4K jobs, which would be
totally fine for me, as more or less FIFO scheduling is defined.
maui went to 100% CPU, didn't respect the RMPOLLINTERVALL (45s for me)
and completely choked on the load, filling up the logs with:
08/16 09:36:03 WARNING: job buffer overflow (cannot add job '498677')
08/16 09:36:03 ERROR: job buffer is full (ignoring job
'498677.SERVER.DOMAIN')
It barely kept 5-10 CPUs (out of 14 CPUs) running.
So my questions are:
1) How can I tell maui that it's OK to only consider the first 4K jobs?
2) How can I keep maui playing along nicely.
3) How can I keep my 14 CPUs busy
I know of this thread
http://www.supercluster.org/pipermail/mauiusers/2004-August/001303.html
but ramping up maui's footprint to 1G is not feasible.
Here's the maui.cfg, nothing too difficult I think:
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY NEVER
NODEALLOCATIONPOLICY PRIORITY
NODECFG[DEFAULT] PRIORITYF='-JOBCOUNT'
QUEUETIMEWEIGHT 1
CREDWEIGHT 1
USERWEIGHT 0
GROUPWEIGHT 0
QOSWEIGHT 3
CLASSWEIGHT 1
USAGEWEIGHT 1
USAGEEXECUTIONTIMEWEIGHT 1
QOSCFG[high] PRIORITY=1000 QFLAGS=PREEMPTOR
QOSCFG[low] PRIORITY=-1000 QFLAGS=PREEMPTEE
CLASSCFG[default] QDEF=low
CLASSCFG[short] QDEF=high MAXNODE=4,14 MAXJOB=4,14
Cheers,
Ronny
More information about the mauiusers
mailing list