[Mauiusers] MAXNODE limit

Josh Butikofer josh at clusterresources.com
Wed Mar 28 13:57:48 MDT 2007


Lennart,

After investigating this bug (and its alternate description in Bugzilla) it appears that you need to
use MAXPROC instead of MAXNODE when JOBNODEMATCHPOLICY is set to EXACTNODE. (The Maui documentation
mentions this as well @
http://www.clusterresources.com/products/maui/docs/6.2throttlingpolicies.shtml under MAXNODE.)

The Bugzilla post mentions that you already tried MAXPROC and that specifying -l nodes=90:ppn=1
still allows the job to run. In my tests, however, Moab successfully blocks the job with my
MAXPROC=4 for the user/group 'josh':

PE:  90.00  StartPriority:  11001
cannot select job 81 for partition DEFAULT (job 81 violates active HARD MAXPROC limit of 4 for user
josh  (R: 90, U: 0)
)

I also tried setting the policy on a QoS and it too worked as expected. Could you please send me a
scenario to show me how the MAXPROC was failing for you? If the job succeeds in running, could you
also send me a "checkjob -v <JOB>" output?

Thanks,

-- 
Joshua Butikofer
Cluster Resources, Inc.

josh at clusterresources.com
Voice: (801) 717-3707
Fax:   (801) 717-3738
--------------------------


Lennart Karlsson wrote:
> Josh,
> 
> You wrote:
>> I would recommend trying out the patch 19 snapshot and see if you
>> experience any problems. We hope to get the official release out over
>> the next few days, and this release would eradicate all known bugs. 
> 
> 
> My most critical Maui bug is logged in your bugzilla as number 141.
> (There are also a bug number 83, that looks similar.)
> 
> Please include it within "all known bugs", that you are fixing now! I would
> really appreciate that.
> 
> The MAXNODE configuration parameter does not work.
> 
> It should be easy for you to repeat the problem on your systems:
> 
> 1/ Start with a simple Maui configuration like (I skip the
> SERVER*/ADMIN/RMCFG/RMPOLLINTERVAL/LOG* preambles):
> 
> QUEUETIMEWEIGHT         10 
> XFACTORWEIGHT           1
> QOSWEIGHT               1
> 
> FSPOLICY                [NONE]
> 
> BACKFILLPOLICY          BESTFIT
> NODEALLOCATIONPOLICY    LASTAVAILABLE
> RESERVATIONPOLICY       CURRENTHIGHEST
> RESERVATIONDEPTH        10
> 
> JOBPRIOACCRUALPOLICY    FULLPOLICY
> 
> NODEACCESSPOLICY        SINGLEJOB
> JOBNODEMATCHPOLICY      EXACTNODE
> 
> QOSCFG[DEFAULT]  PRIORITY=10000  XFWEIGHT=1000 QTWEIGHT=4
> 
> 2/ Add MAXNODE lines for a user and the group of that user, like:
> 
> USERCFG[lka]    MAXNODE=5
> GROUPCFG[nsc]   MAXNODE=5
> 
> 3/ Submit a lot of jobs as that user and wait until her/his jobs run on
> a total of at least five nodes.
> 
> 4/ Run a 'showq' and look at all the jobs of that user, that should be
> 'blocked', but actually is 'idle' (the demonstration is done on a system
> where each node has only one processor, and here MAXNODE could be
> substituted with a MAXPROC, but most of our systems have more than one
> processor on each node):
> 
>  # showq
> ACTIVE JOBS--------------------
> JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME
> 
> 55818                   lka    Running     5    00:05:24  Thu Feb 15 13:26:04
> 55819                   lka    Running     1    00:06:01  Thu Feb 15 13:26:41
> 55820                   lka    Running     1    00:06:02  Thu Feb 15 13:26:42
> 55821                   lka    Running     1    00:06:33  Thu Feb 15 13:27:13
> 55822                   lka    Running     1    00:06:34  Thu Feb 15 13:27:14
> 55823                   lka    Running     1    00:06:35  Thu Feb 15 13:27:15
> 55824                   lka    Running     1    00:06:35  Thu Feb 15 13:27:15
> 55807               andersb    Running    20 11:08:46:33  Wed Feb 14 11:07:13
> 
>      8 Active Jobs      31 of   31 Processors Active (100.00%)
> 
> IDLE JOBS----------------------
> JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME
> 
> 55825                   lka       Idle     1     1:00:00  Thu Feb 15 13:27:15
> 55826                   lka       Idle     1     1:00:00  Thu Feb 15 13:27:16
> 55827                   lka       Idle     1     1:00:00  Thu Feb 15 13:27:16
> 55828                   lka       Idle     1     1:00:00  Thu Feb 15 13:27:17
> 55829                   lka       Idle     1     1:00:00  Thu Feb 15 13:27:17
> 55830                   lka       Idle     1     1:00:00  Thu Feb 15 13:27:17
> 
> 6 Idle Jobs
> 
> BLOCKED JOBS----------------
> JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME
> 
> 5/ Only job number 55818 should be running, the other 'lka' jobs should
> be 'blocked' and neither 'running' nor 'idle'.
> 
> 
> The demo was run with Maui version 3.2.6p19-snap.1171482917.
> 
> I would at least like the MAXNODE parameter to work for GROUP, QOS or
> CLASS, but of course it would be nice to have it working also on USER,
> please.
> 
> Best regards,
> -- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
>    National Supercomputer Centre in Linkoping, Sweden
>    http://www.nsc.liu.se
> 
> 


More information about the mauiusers mailing list