[Mauiusers] MAUI miscounts total running user specific jobs
... user MAXJOB limit is affected
Åke Sandgren
ake.sandgren at hpc2n.umu.se
Wed Nov 30 00:08:38 MST 2005
On Tue, 2005-11-29 at 15:15 -0600, Richard Walsh wrote:
> All,
>
> I am running a snap-shot of Maui from October and have user MAXJOB
> limits specifed as:
>
> >>USERCFG[rbw] PRIORITY=100 MAXPROC=128 MAXJOB=40 QDEF=default
>
> which should allow 40 jobs to run as user 'rbw', but allows only half
> that number because Maui
> FALSELY counts active/running jobs at 40 when there are only 20
> running! If I double the
> MAXJOB twice as many will run, but still only half the number
> specified. Somehow running
> jobs are being double counted. With only 20 running, checkjob on the
> 21st job (19475) yields:
>
> >>job 19475 violates active SOFT MAXJOB limit of 40 for user rbw (R: 1,
> U: 40)
>
> The line above is reported from MJob.c after a MPolicyCheckLimit() call:
>
> sprintf(Message,"job %s violates %s %s %s limit of %d for %s %s %s (R:
> %d, U: %d)\n",
> J->Name,
> MType[mindex],
> MPolicyMode[PLevel],
> MPolicyType[PList[pindex]],
> tmpI,
> MXO[OList2[oindex2]],
> (OID != NULL) ? OID : NONE,
> (oindex1 != mxoNONE) ? MXO[OList1[oindex1]] : "",
> JUsage[PList[pindex]],
> *** OP->Usage[PList[pindex]][0]);
>
> OP->Usage[PList[pindex]][0] contains the wrong value, but finding where
> this value
> for the running job count is accumulated is not easy with all the
> pointer aliasing. I think
> OP is set above in a 'case' statement tree to:
>
> OP = &J->Cred.U->L.AP;
>
> but am not sure. In what routine are running job totals for a giving
> user accumulated?
> What is the name of the structure.element where this number is held
> (???->Usage[0][0])?
>
> We have a go-around, but I would like to fix and recompile ...
>
> Thanks for your thoughts ...
This patch fixes the problem but it is sofar unofficial since i haven't
gotten any comments from above.
The jobs isn't the only thing i manages to count twice btw.
diff -bBwru site/src/moab/MPolicy.c x86_deb30/src/moab/MPolicy.c
--- site/src/moab/MPolicy.c 2005-09-12 18:48:36.000000000 +0200
+++ x86_deb30/src/moab/MPolicy.c 2005-09-16 20:33:08.000000000
+0200
@@ -709,7 +709,7 @@
if (UpdateStats == TRUE)
{
- MStatClearUsage(0,(1 << mlIdle),FALSE);
+ MStatClearUsage(0,(1 << mlActive)|(1 << mlIdle),FALSE);
}
else if (TrackIdle == TRUE)
{
Dave can you get someone to take a look at this? I complained about it
some weeks ago toghether with someone from NSC.
More information about the mauiusers
mailing list