[torqueusers] Questions from a new Torque/Maui admin regarding user proxies, sparse mom instances, and node/resource scheduling

Chris Samuel csamuel at vpac.org
Mon Dec 7 17:43:04 MST 2009

----- "Douglas Wade Needham" <dneedham at cmu.edu> wrote:

> Greetings all,


> I apologize in advance for the length of this message and the
> noob-ishness of the questions, but I am stuck between a learning
> curve and time constraints.


> Now, for my first problem/question.  There is the request by those
> making the decisions to try to keep from propagating all the user
> accounts and home directories across the cluster.  And so, I am
> looking at the use of the '-u' option.
>         12/02 14:08:25 MJobSetHold(36,8,00:00:00,(null),job not
> authorized to use proxy credentials)

That looks like a Maui issue, I'm not sure if Maui supports
proxy users or not and an email to the mauiusers list in October
last year on this issue from someone in Russia elicited no
response. :-(

> My second question is this.  The profs want to end up with the
> non-homogeneous situation where only a couple of nodes run the client
> mom, and MPI is used to run things on the other nodes (sort of the
> reverse of the case for Appendix H).

You really don't want to do this.  You want to be
running pbs_mom on all the nodes and (ideally) use
an MPI stack like Open-MPI that supports the PBS TM
API or an MPI launcher like the OSC mpiexec replacement
that also talks via the TM API and can be used for
many other MPI stacks.

If you don't run pbs_mom's on those nodes then Torque
will think the nodes are down and you'll never get any
jobs scheduled onto them, not to mention you loose the
ability to run jobs in cpusets, use the health check
scripts, etc..

> [...] But I am wondering if there is there a way for
> these users to submit their job in such a way that they
> get resources/nodes on the same switch, but not restrict
> themselves to a specific switch (e.g. they will take N nodes on
> either switch 1 or switch 3 but not split across both)?  

In Moab you can allocate nodes to partitions and then jobs
won't span partitions, I know people who are running a cluster
spread over different rooms with completely separate IB switches
and they used this to solve that problem.

Maui might well support this but I've never had to use there.

> Lastly, I know some jobs will want to use MPI between nodes
> in a dynamic set of nodes (perhaps handed out by Torque/Maui),
> but internally do things like run multiple processes (via
> system()/exec()) on the nodes, and have all processors
> dedicated to that job.  [...]

Just submit jobs that request all the CPUs on the
nodes (say nodes=10:ppn=8) and once the job starts
what you do is up to you - you should have all those
cores dedicated to yourself.

You can use pbsdsh to start processes, or you could
use the OSC mpiexec with -comm=none or you could even
just do SSH yourself, looking at $PBS_NODEFILE to see
what you've been allocated.

Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency

More information about the torqueusers mailing list