[Mauiusers] Clarification on MAXPROC

Lennart Karlsson Lennart.Karlsson at nsc.liu.se
Wed Sep 13 00:26:57 MDT 2006


Jerry,

(R: 64, U: 128) means that you have Requested 64 more processors
and already Use 128 processors.

Maui does not allow you to use 64+128 processors at the same time,
because there is a limit of 132 processors, so Maui puts your job
in the Hold queue. As you write, if the second job would have been
within the MAXPROC limit, the job would have been put into the Idle
queue or already started.

You think that you run on only 64 processors just now, but
Maui does not think so.

You might have been attacked by bug number 98, that I reported to the
Maui bugzilla last October: "More than one Maui 3.2.6p14
snapshots, e.g. maui-3.2.6p14-snap.1127934075, does double count
resources used by running jobs".

It has been fixed by CRI in later Maui versions and there exists also a
3.2.6p14 patch, written by Ake.Sandgren at hpc2n.umu.se, that seems
to solve the problem. I use the Ake Sandgren patch on one cluster,
and use earlier (p11) or later (p16-snap) Maui patch versions on other
clusters, with good results.

Best regards,
-- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
   National Supercomputer Centre in Linkoping, Sweden
   http://www.nsc.liu.se


Jerry  Smith wrote:
> Good Afternoon all,
> 
> 
> Using maui 3.2.6p14 with Torque 2.0.0p8
> 
> Dual Processor boxes configured with np=2  in $PBSHOME/nodes
> 
> We are setting a limit on users to a maximum of 132 processors total
> allocated at any given time using:
> 
> USERCFG[DEFAULT] MAXPROC=132  #66 nodes
> 
> What we are seeing is a user submits a 32 node / 64 proc job and that job is
> allocated and running.
> 
> The user then submits a second job with the exact same requirements.  The
> second job is then placed in the Deferred  state, with the following error
> message from: checkjob -v <jobid>
> 
> cannot select job 112642 for partition DEFAULT (job 112642 violates active
> HARD MAXPROC limit of 132 for user userA  (R: 64, U: 128)
> 
> 
> My question is first what is the (R: 64, U: 128) referencing?
> 
> R:64 ( 64 running PROCS for this user? )
> U:128 ( NO IDEA ) it looks like exactly double allocated procs
> 
> 
> My thought was that 2 jobs at 64 procs = 128 total processors which is
> Less than the 132 procs set by MAXPROC.
> 
> Why is the job deferred?   Even with not enough available nodes
>   189 of  198 Nodes Active      (95.45%)
> 
> Shouldn't this job be in the IDLE section of showq and not blocked/deferred?
> 
> 
> Thanks in advance for any help.
> 
> Jerry Smith 
> -------------------------------------
> Infrastructure Computing
> Sandia National labs
> 
> 
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
> 




More information about the mauiusers mailing list