[Mauiusers] Re: [torqueusers] Maui?torque problems with NodeSet

Bas van der Vlies basv at sara.nl
Mon Jan 24 10:04:05 MST 2005


Dave Jackson wrote:
> Bas,
> 
>   The problem you were seeing stemmed from the fact that Maui's NodeSet
> feature only enforced task based job constraints.  It was correctly
> finding enough tasks, but not enough nodes to run the job.  We have made
> some extensions to the NodeSet algorithm which should allow it to
> enforce both node and task based requirements.  This is available in the
> latest snapshot release.  We will continue to test this release
> internally but please let us know if this resolves your issue.
> 
> Thanks,
> Dave
> 

Will the problem be fixed in an new release/snaphot of maui. I have 
tried the latest snapshot of maui, but that did not solve the problem.
Thanks for the effort.

	Regards


> On Thu, 2004-12-23 at 21:17 +0100, Bas van der Vlies wrote:
>  > Bas van der Vlies wrote:
>  > > Bas van der Vlies wrote:
>  > >
>  > >> We have 256 node cluster and 4 switches with 64 nodes connected to it.
>  > >> In our nodes file of torque:
>  > >>  gb-r1n1 np=2 switchA
>  > >>  ...
>  > >>  gb-r4n7 np=2 switchB
>  > >>
>  > >>  gb-r4n8 np=2 switchB
>  > >>  ....
>  > >>  gb-r7n14 np=2 switchB
>  > >>
>  > >> --- Maui config ---
>  > >>
>  > >> NODESETPOLICY           ONEOF
>  > >> NODESETATTRIBUTE        FEATURE
>  > >> NODESETDELAY            10
>  > >> NODESETLIST             switchA switchB switchC switchD
>  > >>
>  > >> -------------------
>  > >>
>  > >> I only want to run jobs that are restricted to nodes that belong to
>  > >> one switch (therefore NODESETDELAY ).
>  > >>
>  > >> When i run an job with: qsub -I -lnodes=40
>  > >>  It is scheduled to switchD
>  > >>
>  > >> Now i start another qsub -I -lnodes=40:
>  > >>    scheduled to switchC
>  > >>
>  > >> The third one waits for ever and ever. If one of the first two jobs is
>  > >> completed then this jobs run. I expected that it would run on switchA
>  > >> or switchB. Is this an bug?
>  > >>
>  > >> I can acomplish it with the following command:
>  > >>     qsub -I -lnodes=40:switchA
>  > >>
>  > >> Can this also be done automatically?
>  > >>
>  > >>         Regards
>  > >>
>  > >>
>  > >> torque 1.1.0p4
>  > >> maui 3.2.6p10
>  > >>
>  > >>
>  > >>
>  > >
>  > > I have also tried:
>  > > qsub -I -lnodes=40 -l walltime=10:00:00 -W  \
>  > >    x=\"NODESET=ONEOF:FEATURE:switchA:switchB:switchC:switchD\"
>  > >    
>  > > With this qsub command i only get two jobs running on switchD and
>  > > switchC and third one is waiting for ever and ever.
>  > >
>  > >         Regards and Thanks for any help
>  >
>  > I think i have found the problem. Each nodeset has 64 nodes. When i
>  > submit jobs that require 40 nodes. It runs on:
>  >    switchD
>  >    switchC
>  >    .... hangs for ever. This is because on switchD there are 24 nodes
>  >         free and als on switchC. This means 48 nodes so the job could
>  >         run, but the "NODESETDELAY" prevents this.
>  >
>  >
>  > To prove this i submit jobs that require 50 nodes and this runs as 
> expected:
>  >   - switchA
>  >   - switchB
>  >   - switchC
>  >   - switchD
>  >
>  > I have also upgraded maui to 3.2.6p11 and that have the same problem ;-(
>  >
>  >
> 


-- 
--
********************************************************************
*                                                                  *
*  Bas van der Vlies                     e-mail: basv at sara.nl      *
*  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
*  Kruislaan 415                         fax:    +31 20 6683167    *
*  1098 SJ Amsterdam                                               *
*                                                                  *
********************************************************************


More information about the mauiusers mailing list