[torqueusers] MPI jobs not tied to nodes/ppn configuration

Gareth.Williams at csiro.au Gareth.Williams at csiro.au
Fri Oct 16 17:41:17 MDT 2009


Hi All, sorry for the top-post - dumb web mail interface...

Naturally this has been discussed before.  The scheduler makes the choice of what to do with the resource specs and tells torque where to start the job.  If you are using maui/moab the setting JOBNODEMATCHPOLICY determines the interpretation of node=N:ppn=M.  With nothing set, nodes=N (no ppn) with give you N cores - but you have to fake the number of nodes available... because nodes now really means cores (not nice, but workable).  With EXACTNODE, you get sets of M cores on each of N nodes (though it may actually put multiple sets of M on a nodes if they will fit - not sure).  There's also an EXACTPROC setting.

Support for ncpus is not good and I think it is mostly not used.  It would be hard to fix - though the ANU seem to be doing alright with their ANUPBS fork where they have strong control over both the batch system and their own scheduler.

I've suggested before that a new resource, say ncores would be a good idea.  That way one could cleanly choose between saying 'I just want N cores' and 'I want N nodes with M cores each', without having to make the choice of one or the other at a global level.  It would require both torque and scheduler changes which makes it unlikely to happen.

-- Gareth
________________________________________
From: rozelak at volny.cz [rozelak at volny.cz]
Sent: Friday, 16 October 2009 5:43 PM
To: torqueusers at supercluster.org
Subject: Re: [torqueusers]      MPI jobs not tied to nodes/ppn configuration

> This is dependent on how the scheduler is setup  (
> if you allow multiple
> jobs on a single node etc ).
>
> But I believe you can use:
> qsub -l ncpus=X ...  where X is the total number of
> cpus you need.
>
> Jerry

AFAIK, it does not work with PBSpro on cluster with multi-core/processor
nodes (I think it works on SMP machines, but I did not try it, as it
is not my case ...)

Dan

>
> rozelak at volny.cz wrote:
> > Hallo,
> >
> > I have access to heterogeneous clusters with many
> > multi-core/processor
> > > nodes, where PBSPro is installed. When I want to
> > start MPI job, I need
> > > to specify how many nodes, and how many CPUs per
> > node I want. E.g.,
> > > when I require 32 MPI processes, I need to run it
> > as:
> > >
> > qsub -l nodes=16:ppn=2 ...
> >
> > The problem is, that PBS will wait until there are
> > at lest 16 nodes,
> > > each with 2 cores free, even if there are more that
> > 32 cores free (e.g.
> > > 15 nodes with 2 free cores each + 2 and more nodes
> > with one free core,
> > > giving 32+ free cores available). This can be found
> > for any nodes/ppn
> > > combination, e.g.:
> >
> > qsub -l nodes=32:ppn=1 ...
> >
> > will not be started on 31 nodes with 4+ free cores
> > (having 124 cores
> > > free!). What I need is just to say -- I need XY cores/processors
> > > > > > > > for
> > > my job in a cluster, and I do not care how many nodes
> > it will be started
> > > on, while each node may allocate different number
> > of cores.
> > >
> >
> > So, the question is: is 'torque' able to handle such
> > cases? And how?
> > > If so, I will talk about it with our clusters admins,
> > as I remember
> > > that they considered the migration from PBS and they
> > are opened to our(users)
> > > wishes.
> >
> > Thank you very much for your answer,
> > Dan T.
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> > >
> >
> >
>





More information about the torqueusers mailing list