[torqueusers] procs resource advice and docs
siegert at sfu.ca
Wed Jan 20 18:50:24 MST 2010
On Thu, Jan 21, 2010 at 12:30:38PM +1100, Gareth.Williams at csiro.au wrote:
> Thanks Martin and Roman,
> > We are using this together with the
> > JOBNODEMATCHPOLICY EXACTNODE
> > in moab. This way the -l nodes=n:ppn=m specification is interpreted the
> > way users expect it to work and at the same time we have a method
> > available
> > to users who do not care how their processes are distributed accross
> > nodes.
> > After gaining some experience with this we now recommend to users to
> > use -l procs=N unless they have a specific reason to use the -l
> > nodes=n:ppn=m
> > syntax: the waiting time in the queue with -l procs=N is much, much
> > shorter.
> Yes, we prefer JOBNODEMATCHPOLICY EXACTNODE too.
> > > Is there a downside apart from the lack of documentation?
> > None. Only benefits. The usage percentage of the cluster increases
> > dramatically.
> > > I see it's documented in the pbs_resources_unicos8 man page so I guess
> > > it was developed by or for CRAY, possibly long ago, but it seems to work
> > > fine on our linux systems. The pbs_resources_unicos8 man page does not
> > > mention that it conflicts with the nodes resource syntax but I guess
> > this
> > > is obvious.
> > Actually, -l procs is fairly new - we requested that feature :-)
> > Thus, I do not believe that whatever is mentioned under CRAY applies.
> Who did the work - it would be good to see what they have to say about documenting the change? The unicos8 man page just has the following line:
> procs Maximum number of processes in the job. Units: unitary
> which might be enough, except perhaps to note the relationship or conflict with the nodes resource.
All the work was done by the folks at Clusterresources.
> > > moab just uses the nodes resource request if it is present (ignoring
> > > the procs resource)
> > Actually it is the other way round: if you specify procs then nodes are
> > ignored, see:
> > http://www.clusterresources.com/products/mwm/docs/13.3rmextensions.shtml#p
> > rocs
> Interesting all round :-). I misinterpreted my admittedly not-extensive tests. I can now confirm your assertion but also see that procs jobs are not packing into nodes on at least one of our clusters - the procs are being partly distributed one per node and partly paired up. Strange but not critical as we can use the nodes syntax when more control is needed.
I'd actually like to be able to use a combination of procs and nodes, e.g.,
i.e., request one complete node (we have 8-core nodes) and no restriction
for the remaining processors. The reason: think of a master-slave
program: the master is idle most of the time, the slaves do all the
work. Thus it is a waste of resources to reserve one core for the
master. But evil sysadmins (like myself) do not allow oversubcription
of processors (cores), if the job does not claim all processors on a
node. The syntax above would allow me to run 9 processes on the first
node, i.e., the master and 8 slaves. Admittedly such cases are rare
and not hugely important.
> > (I am confused here: that page existed a few days ago, but now it is
> > gone).
> > > and I guess maui would too but that is a function of the scheduler
> > > and doesn't really have a place in the torque docs per-se.
> > I am not sure whether maui has support for procs.
> > > Can the linux pbs_resources man page should be updated to include
> > > the procs resource?
> > There is one more issue: if you are using OSC's mpiexec you need
> > to patch the code in get_hosts.c to add support for procs. I can
> > email you the patch, if you need it.
> We're using openmpi and the torque integration is working just fine, but thanks for the offer. (also SGI's MPT/MPI)
We are mostly using openmpi as well. Once in a while somebody wants
to use Intel MPI in which case they must use OSC's mpiexec to start
More information about the torqueusers