[Mauiusers] MAUI miscounts total running user specific jobs ... user MAXJOB limit is affected

Åke Sandgren ake.sandgren at hpc2n.umu.se
Wed Nov 30 00:08:38 MST 2005


On Tue, 2005-11-29 at 15:15 -0600, Richard Walsh wrote:
> All,
> 
> I am running a snap-shot of Maui from October and have user MAXJOB 
> limits specifed as:
> 
>  >>USERCFG[rbw]      PRIORITY=100 MAXPROC=128 MAXJOB=40 QDEF=default
> 
> which should allow 40 jobs to run as user 'rbw', but allows only half 
> that number because Maui
> FALSELY counts active/running jobs at 40 when there are only 20 
> running!  If I double the
> MAXJOB twice as many will run, but still only half the number 
> specified.  Somehow running
> jobs are being double counted.  With only 20 running, checkjob on the 
> 21st job (19475) yields:
> 
>  >>job 19475 violates active SOFT MAXJOB limit of 40 for user rbw (R: 1, 
> U: 40)
> 
> The line above is reported from MJob.c after a MPolicyCheckLimit() call:
> 
> sprintf(Message,"job %s violates %s %s %s limit of %d for %s %s %s (R: 
> %d, U: %d)\n",
>                 J->Name,
>                 MType[mindex],
>                 MPolicyMode[PLevel],
>                 MPolicyType[PList[pindex]],
>                 tmpI,
>                 MXO[OList2[oindex2]],
>                 (OID != NULL) ? OID : NONE,
>                 (oindex1 != mxoNONE) ? MXO[OList1[oindex1]] : "",
>                 JUsage[PList[pindex]],
>  ***         OP->Usage[PList[pindex]][0]);
> 
> OP->Usage[PList[pindex]][0] contains the wrong value, but finding where 
> this value
> for the running job count is accumulated is not easy with all the 
> pointer aliasing.  I think
> OP is set above in a 'case' statement tree to:
> 
> OP = &J->Cred.U->L.AP;
> 
> but am not sure.   In what routine are running job totals for a giving 
> user accumulated?
> What is the name of the structure.element where this number is held 
> (???->Usage[0][0])? 
> 
> We have a go-around, but I would like to fix and recompile ...
> 
> Thanks for your thoughts ...


This patch fixes the problem but it is sofar unofficial since i haven't
gotten any comments from above.

The jobs isn't the only thing i manages to count twice btw.

diff -bBwru site/src/moab/MPolicy.c x86_deb30/src/moab/MPolicy.c
--- site/src/moab/MPolicy.c     2005-09-12 18:48:36.000000000 +0200
+++ x86_deb30/src/moab/MPolicy.c        2005-09-16 20:33:08.000000000
+0200
@@ -709,7 +709,7 @@

   if (UpdateStats == TRUE)
     {
-    MStatClearUsage(0,(1 << mlIdle),FALSE);
+    MStatClearUsage(0,(1 << mlActive)|(1 << mlIdle),FALSE);
     }
   else if (TrackIdle == TRUE)
     {


Dave can you get someone to take a look at this? I complained about it
some weeks ago toghether with someone from NSC.


More information about the mauiusers mailing list