[Mauiusers] MAXNODE limit

Josh Butikofer josh at clusterresources.com
Mon Apr 2 14:35:06 MDT 2007


Lennart,

> I am a little confused that you say that "Moab" successfully does the blocking,
> but I presume that you actually have used Maui.

Yes, I did use Maui to do my tests--it was a slip of the fingers when I said "Moab."

After looking more closely at your Bugzilla report, I see what you mean. I'll hopefully be able to
take a closer look at this over the course of the week.

-- 
Joshua Butikofer
Cluster Resources, Inc.

josh at clusterresources.com
Voice: (801) 717-3707
Fax:   (801) 717-3738
--------------------------


Lennart Karlsson wrote:
> Josh,
> 
> Yes, your MAXPROC=4 configuration successfully blocks your "-l nodes=90:ppn=1"
> job. I agree on that.
> 
> What I say in the Bugzilla post is that for two-processor nodes, a MAXPROC=100
> does not block a "-l nodes=90:ppn=1" job, although it will allocate 90 nodes,
> i.e. 180 processors.
> 
> Because of that, MAXPROC is not the correct tool and I need MAXNODE
> to work.
> 

> 
> Does your MAXPROC=4 configuration successfully block an "-l nodes=3:ppn=1"
> job, when JOBNODEMATCHPOLICY is set to to EXACTNODE and NODEACCESSPOLICY is
> set to SINGLEJOB? For me, on two-processor nodes, it does not and I see no way
> to use MAXPROC to emulate a non-working MAXNODE.
> 
> In less technical terms, it seems like Maui does not understand how many
> processors a job will allocate, until the job is running.
> 
> So please, I would like MAXNODE to work in Maui.
> 
> Best regards,
> -- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
>    National Supercomputer Centre in Linkoping, Sweden
>    http://www.nsc.liu.se
> 
> 
> Joshua Butikofer wrote:
>> After investigating this bug (and its alternate description in Bugzilla) it appears that you need to
>> use MAXPROC instead of MAXNODE when JOBNODEMATCHPOLICY is set to EXACTNODE. (The Maui documentation
>> mentions this as well @
>> http://www.clusterresources.com/products/maui/docs/6.2throttlingpolicies.shtml under MAXNODE.)
>>
>> The Bugzilla post mentions that you already tried MAXPROC and that specifying -l nodes=90:ppn=1
>> still allows the job to run. In my tests, however, Moab successfully blocks the job with my
>> MAXPROC=4 for the user/group 'josh':
>>
>> PE:  90.00  StartPriority:  11001
>> cannot select job 81 for partition DEFAULT (job 81 violates active HARD MAXPROC limit of 4 for user
>> josh  (R: 90, U: 0)
>> )
>>
>> I also tried setting the policy on a QoS and it too worked as expected. Could you please send me a
>> scenario to show me how the MAXPROC was failing for you? If the job succeeds in running, could you
>> also send me a "checkjob -v <JOB>" output?
>>
>> Thanks,
>>
>> -- 
>> Joshua Butikofer
>> Cluster Resources, Inc.
>>
>> josh at clusterresources.com
>> Voice: (801) 717-3707
>> Fax:   (801) 717-3738
>> --------------------------
>>
>>
>> Lennart Karlsson wrote:
>>> Josh,
>>>
>>> You wrote:
>>>> I would recommend trying out the patch 19 snapshot and see if you
>>>> experience any problems. We hope to get the official release out over
>>>> the next few days, and this release would eradicate all known bugs. 
>>>
>>> My most critical Maui bug is logged in your bugzilla as number 141.
>>> (There are also a bug number 83, that looks similar.)
>>>
>>> Please include it within "all known bugs", that you are fixing now! I would
>>> really appreciate that.
>>>
>>> The MAXNODE configuration parameter does not work.
>>>
>>> It should be easy for you to repeat the problem on your systems:
>>>
>>> 1/ Start with a simple Maui configuration like (I skip the
>>> SERVER*/ADMIN/RMCFG/RMPOLLINTERVAL/LOG* preambles):
>>>
>>> QUEUETIMEWEIGHT         10 
>>> XFACTORWEIGHT           1
>>> QOSWEIGHT               1
>>>
>>> FSPOLICY                [NONE]
>>>
>>> BACKFILLPOLICY          BESTFIT
>>> NODEALLOCATIONPOLICY    LASTAVAILABLE
>>> RESERVATIONPOLICY       CURRENTHIGHEST
>>> RESERVATIONDEPTH        10
>>>
>>> JOBPRIOACCRUALPOLICY    FULLPOLICY
>>>
>>> NODEACCESSPOLICY        SINGLEJOB
>>> JOBNODEMATCHPOLICY      EXACTNODE
>>>
>>> QOSCFG[DEFAULT]  PRIORITY=10000  XFWEIGHT=1000 QTWEIGHT=4
>>>
>>> 2/ Add MAXNODE lines for a user and the group of that user, like:
>>>
>>> USERCFG[lka]    MAXNODE=5
>>> GROUPCFG[nsc]   MAXNODE=5
>>>
>>> 3/ Submit a lot of jobs as that user and wait until her/his jobs run on
>>> a total of at least five nodes.
>>>
>>> 4/ Run a 'showq' and look at all the jobs of that user, that should be
>>> 'blocked', but actually is 'idle' (the demonstration is done on a system
>>> where each node has only one processor, and here MAXNODE could be
>>> substituted with a MAXPROC, but most of our systems have more than one
>>> processor on each node):
>>>
>>>  # showq
>>> ACTIVE JOBS--------------------
>>> JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME
>>>
>>> 55818                   lka    Running     5    00:05:24  Thu Feb 15 13:26:04
>>> 55819                   lka    Running     1    00:06:01  Thu Feb 15 13:26:41
>>> 55820                   lka    Running     1    00:06:02  Thu Feb 15 13:26:42
>>> 55821                   lka    Running     1    00:06:33  Thu Feb 15 13:27:13
>>> 55822                   lka    Running     1    00:06:34  Thu Feb 15 13:27:14
>>> 55823                   lka    Running     1    00:06:35  Thu Feb 15 13:27:15
>>> 55824                   lka    Running     1    00:06:35  Thu Feb 15 13:27:15
>>> 55807               andersb    Running    20 11:08:46:33  Wed Feb 14 11:07:13
>>>
>>>      8 Active Jobs      31 of   31 Processors Active (100.00%)
>>>
>>> IDLE JOBS----------------------
>>> JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME
>>>
>>> 55825                   lka       Idle     1     1:00:00  Thu Feb 15 13:27:15
>>> 55826                   lka       Idle     1     1:00:00  Thu Feb 15 13:27:16
>>> 55827                   lka       Idle     1     1:00:00  Thu Feb 15 13:27:16
>>> 55828                   lka       Idle     1     1:00:00  Thu Feb 15 13:27:17
>>> 55829                   lka       Idle     1     1:00:00  Thu Feb 15 13:27:17
>>> 55830                   lka       Idle     1     1:00:00  Thu Feb 15 13:27:17
>>>
>>> 6 Idle Jobs
>>>
>>> BLOCKED JOBS----------------
>>> JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME
>>>
>>> 5/ Only job number 55818 should be running, the other 'lka' jobs should
>>> be 'blocked' and neither 'running' nor 'idle'.
>>>
>>>
>>> The demo was run with Maui version 3.2.6p19-snap.1171482917.
>>>
>>> I would at least like the MAXNODE parameter to work for GROUP, QOS or
>>> CLASS, but of course it would be nice to have it working also on USER,
>>> please.
>>>
>>> Best regards,
>>> -- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
>>>    National Supercomputer Centre in Linkoping, Sweden
>>>    http://www.nsc.liu.se
>>>
>>>
>> _______________________________________________
>> mauiusers mailing list
>> mauiusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>
> 
> 



More information about the mauiusers mailing list