[Mauiusers] Re: [torqueusers] Maui?torque problems with NodeSet
Bas van der Vlies
basv at sara.nl
Mon Jan 3 03:19:24 MST 2005
Dave Jackson wrote:
> Bas,
>
> The problem you were seeing stemmed from the fact that Maui's NodeSet
> feature only enforced task based job constraints. It was correctly
> finding enough tasks, but not enough nodes to run the job. We have made
> some extensions to the NodeSet algorithm which should allow it to
> enforce both node and task based requirements. This is available in the
> latest snapshot release. We will continue to test this release
> internally but please let us know if this resolves your issue.
>
Dave i will test the snapshot and let you know if it solves our problem.
Thanks for the effort
>
> On Thu, 2004-12-23 at 21:17 +0100, Bas van der Vlies wrote:
> > Bas van der Vlies wrote:
> > > Bas van der Vlies wrote:
> > >
> > >> We have 256 node cluster and 4 switches with 64 nodes connected to it.
> > >> In our nodes file of torque:
> > >> gb-r1n1 np=2 switchA
> > >> ...
> > >> gb-r4n7 np=2 switchB
> > >>
> > >> gb-r4n8 np=2 switchB
> > >> ....
> > >> gb-r7n14 np=2 switchB
> > >>
> > >> --- Maui config ---
> > >>
> > >> NODESETPOLICY ONEOF
> > >> NODESETATTRIBUTE FEATURE
> > >> NODESETDELAY 10
> > >> NODESETLIST switchA switchB switchC switchD
> > >>
> > >> -------------------
> > >>
> > >> I only want to run jobs that are restricted to nodes that belong to
> > >> one switch (therefore NODESETDELAY ).
> > >>
> > >> When i run an job with: qsub -I -lnodes=40
> > >> It is scheduled to switchD
> > >>
> > >> Now i start another qsub -I -lnodes=40:
> > >> scheduled to switchC
> > >>
> > >> The third one waits for ever and ever. If one of the first two jobs is
> > >> completed then this jobs run. I expected that it would run on switchA
> > >> or switchB. Is this an bug?
> > >>
> > >> I can acomplish it with the following command:
> > >> qsub -I -lnodes=40:switchA
> > >>
> > >> Can this also be done automatically?
> > >>
> > >> Regards
> > >>
> > >>
> > >> torque 1.1.0p4
> > >> maui 3.2.6p10
> > >>
> > >>
> > >>
> > >
> > > I have also tried:
> > > qsub -I -lnodes=40 -l walltime=10:00:00 -W \
> > > x=\"NODESET=ONEOF:FEATURE:switchA:switchB:switchC:switchD\"
> > >
> > > With this qsub command i only get two jobs running on switchD and
> > > switchC and third one is waiting for ever and ever.
> > >
> > > Regards and Thanks for any help
> >
> > I think i have found the problem. Each nodeset has 64 nodes. When i
> > submit jobs that require 40 nodes. It runs on:
> > switchD
> > switchC
> > .... hangs for ever. This is because on switchD there are 24 nodes
> > free and als on switchC. This means 48 nodes so the job could
> > run, but the "NODESETDELAY" prevents this.
> >
> >
> > To prove this i submit jobs that require 50 nodes and this runs as
> expected:
> > - switchA
> > - switchB
> > - switchC
> > - switchD
> >
> > I have also upgraded maui to 3.2.6p11 and that have the same problem ;-(
> >
> >
>
--
--
********************************************************************
* *
* Bas van der Vlies e-mail: basv at sara.nl *
* SARA - Academic Computing Services phone: +31 20 592 8012 *
* Kruislaan 415 fax: +31 20 6683167 *
* 1098 SJ Amsterdam *
* *
********************************************************************
More information about the mauiusers
mailing list