[Mauiusers] Backfill and node reservation

Arnau Bria arnaubria at pic.es
Mon Nov 15 09:23:13 MST 2010


On Mon, 15 Nov 2010 13:57:57 -0200
Denis Denis wrote:

Hi,

> > > Could you send you maui.cfg?
> > Sure (I've added a couple of node bewteen lines).
> >
> >
> > SERVERHOST              NAME
> > ADMIN1                  root
> > ADMIN3                  edginfo rgma edguser monami
> > ADMINHOST               NAME
> > RMCFG[base]             TYPE=PBS TIMEOUT=30
> > SERVERPORT              40559
> > SERVERMODE              NORMAL
> >
> > RMPOLLINTERVAL        00:02:00
> > LOGFILE               /var/log/maui.log
> > LOGFILEMAXSIZE        50000000
> >
> > IDLEJOBDEPTH  300
> > #This come from a patch
> > #http://www.supercluster.org/pipermail/mauiusers/2009-February/003746.html
> >
> >
> >
> > BACKFILLPOLICY          NONE
> > BACKFILLDEPTH           1
> > LOGLEVEL                1
> >
> > LOGFILEROLLDEPTH        50
> >
> > ENABLENEGJOBPRIORITY true
> > REJECTNEGPRIOJOBS false
> >
> > QUEUETIMEWEIGHT         0
> >
> > XFACTORWEIGHT           0
> >
> >
> > CREDWEIGHT              1
> > GROUPWEIGHT             1
> > USERWEIGHT              1
> > CLASSWEIGHT             1
> >
> > NODEALLOCATIONPOLICY    CPULOAD
> >
> > DEFERTIME               00:00:00
> >
> > CLASSCFG[long]          MAXPROC=100
> > CLASSCFG[medium]        MAXPROC=100
> > GROUPCFG[dteam]         MAXPROC=40 PRIORITY=10
> > GROUPCFG[dtsgm]         MAXPROC=2 PRIORITY=100000
> > GROUPCFG[dtprd]         MAXPROC=20 PRIORITY=100000
> > GROUPCFG[ops]           MAXPROC=20 PRIORITY=100000
> > GROUPCFG[pilotops]      MAXPROC=20 PRIORITY=100000
> > USERCFG[arnaubria]      PRIORITY=1000
> >
> > SRCFG[picsgm_64]
> >  GROUPLIST=atsgm,sgmcm,lhsgm,masgm,ctasgm,dtsgm,misgm,pasgm,picvosgm,sgmibergrid
> > SRCFG[picsgm_64]        RESOURCES=PROCS:4
> > SRCFG[picsgm_64]        PRIORITY=1000
> > SRCFG[picsgm_64]        HOSTLIST=tditaller021
> > SRCFG[picsgm_64]        STARTTIME=0:00:00 ENDTIME=24:00:00
> > SRCFG[picsgm_64]        PERIOD=INFINITY
> >
> > FSWEIGHT                1
> > FSUSERWEIGHT            2
> > FSGROUPWEIGHT           10
> > FSQOSWEIGHT             100
> >
> > FSDEPTH                 4
> > FSINTERVAL              12:00:00
> > FSDECAY                 0.5
> > FSPOLICY                DEDICATEDPS%
> >
> >
> >
> > GROUPCFG[masgm]         FSTARGET=10  QDEF=magic MAXPROC=2
> > GROUPCFG[maprd]         FSTARGET=10  QDEF=magic
> > GROUPCFG[magic]         FSTARGET=10  QDEF=magic
> > QOSCFG[magic]           FSTARGET=5.79
> > [....]
> >
> > OTHER QOS CONF
> > [...]
> >
> >
> what does a diagnose -p report?
I dont' have jobs running and my testing nodes are down (except my
torque-test server).

But I can tell you that my jobs where on top (I'm arnaubria user, so my
prio is 100000.... )

> Is it possible that the jobs which are running before your highest
> priority job are not being backfilled but having a higher priority
> instead due to the weights of the other metrics?
> I see that the CREDWEIGHT is set to 1 while QOS for example is set to
> 100.
No, that's impossible.
Other users prio are based on FS. Their prio go from negative values to
a prio of 200... I've never seen a prio superior to that. 

> Also there are some groups with priority really high ( 100000)
Those are very special groups (not the ones casuing problems) and
myself.


Let me try to quick reproduce a case in my prod cluster.

I'll come back in a few.

Cheers and thans for your replies,
Arnau


More information about the mauiusers mailing list