[Mauiusers] problem with maxnode & partitions
jacksond at clusterresources.com
Sat Apr 9 19:14:25 MDT 2005
I cannot be certain if this is the root cause of what your are seeing
but the MAXNODE policy is difficult to enforce internally particularly
on shared node, multi-proc systems. The difficulty lies in the fact
that the scheduler may not know the job's task distribution until it
attempts to start it. This means a job may ask for 16 processors and
be mapped to anywhere from 8 to 16 nodes depending on node availability.
Consequently, some jobs which explicitly specify processor count rather
than node count may have a node count of 0 until they are started and
may thus escape the enforcement of the maxnode policy.
Is there a chance this is what is happening? If so, I believe we can
make the maxnode policy smarter by incorporating information about
cluster configuration or node access policy to calculate a probable node
mapping before the job is started and better enforce maxnode. If this
is the cause of the problem, let us know your node processor layout and
your node access policy and we can get started on making this smarter.
On Fri, 2005-04-08 at 11:51 -0400, Andrew J Caird wrote:
> I have a similar question; maybe some general clarification on MAXNODE
> and MAXPROC would help both Darrian and me.
> I have a configuration that looks like:
> USERCFG[DEFAULT] PLIST=GENERAL FSTARGET=10 MAXPROC=32
> CLASSCFG[long] MAXNODE=96 PLIST=GENERAL PRIORITY=40
> CLASSCFG[short] MAXNODE=64 PLIST=GENERAL PRIORITY=30
> CLASSCFG[private1] PLIST=PRIVATE1 PRIORITY=60
> CLASSCFG[private2] PLIST=PRIVATE2 PRIORITY=60
> NODECFG[n001] PARTITION=GENERAL
> NODECFG[n100] PARTITION=GENERAL
> NODECFG[n101] PARTITION=PRIVATE1
> NODECFG[n150] PARTITION=PRIVATE1
> NODECFG[n151] PARTITION=PRIVATE2
> NODECFG[n200] PARTITION=PRIVATE2
> In English, I'm trying to say: for the 100 nodes in the GENERAL
> partition, no more than 96 can be used for long jobs, and no more than
> 64 for short jobs; there are no such restrictions on PRIVATE1 or
> PRIVATE2; overall no one can run on more than 32 processors. This seems
> to be working for me.
> What I'd like to say, however, is that no one user can run on more than
> 32 processors in the GENERAL partition, regardless of what they are
> doing in a PRIVATE partition, and that there are no such limits in the
> PRIVATE partitions. In fact, does specifying PLIST=GENERAL in the
> USERCFG line mean I'm getting this today? I haven't had a job mix that
> would test this. Perhaps this is what Darrien needs?
> I tried eliminating the MAXPROC=32 from the USERCFG line and adding this:
> CLASSCFG[long] MAXNODE=96 PLIST=GENERAL PRIORITY=40 MAXPROC[USER]=32
> CLASSCFG[short] MAXNODE=64 PLIST=GENERAL PRIORITY=30 MAXPROC[USER]=32
> but it didn't seem to work. The docs at
> seem to imply that it should.
> Thanks a lot for any insight.
> Darrian Hale wrote:
> > Hello, I'm having a problem where the maxnode directive seems to be ignored if
> > I have multiple partitions set up.
> > For example, If i have the following in my config file, users can only use a
> > maximum of 8 nodes.
> > USERCFG[DEFAULT] MAXNODE=8
> > SYSCFG[base] PLIST=DEFAULT
> > However, if i add something like this to the config file, users can use
> > however many nodes they want.
> > NODECFG[cn1] PARTITION=test
> > ...
> > NODECFG[cnX] PARTITION=test
> > CLASSCFG[test] PLIST=DEFAULT,test
> > Ive also tried adding USERCFG[test] MAXNODE=8 to the config file, but the
> > results are the same.
> > Thanks for your help,
> > Darrian
> > _______________________________________________
> > mauiusers mailing list
> > mauiusers at supercluster.org
> > http://supercluster.org/mailman/listinfo/mauiusers
> mauiusers mailing list
> mauiusers at supercluster.org
More information about the mauiusers