Bug 93 - Resource management semantics of Torque need to be well defined
: Resource management semantics of Torque need to be well defined
Status: NEW
Product: TORQUE
pbs_server
: 3.0.0-alpha
: PC Linux
: P5 critical
Assigned To: Glen
:
:
:
  Show dependency treegraph
 
Reported: 2010-10-25 06:39 MDT by Simon Toth
Modified: 2010-12-07 09:26 MST (History)
5 users (show)

See Also:


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description Simon Toth 2010-10-25 06:39:18 MDT
Currently Torque includes inconsistent resource management semantics. These
semantics need to be redefined.

* External Schedulers *

From what I have been told (I only work with plain Torque), external schedulers
(Moab, Maui) send in their run requests a very specific nodespec or directly an
exechost list.

If this is not true then we need to consider what semantics do external
schedulers expect from Torque.

If this is true, then these schedulers can be safely ignored (as far as
resource semantics go).

* Process (ppn) semantics *

Process semantics should be dumped completely. The only thing that they are
useful right now is limiting vmem in a per-process manner.

The number of processes isn't limited by torque (not 100% sure here) and with
the liberal approach towards forking in most Linux software, this wouldn't be a
good idea either.

* Per-job, per-node, per-process resource *

Even when the process semantic is dumped we still need to distinct between
per-node and per-job resources.

For example mem should definitely be a per-node resource while number of matlab
licenses should definitely be a per-job resource.

* Configurable with pre-set defaults or strict *

I would definitely like a configurable approach. Setting flags in the resource
definition (as done in my bug 67) is probably not the best approach (so we need
to come up with something more sane). In both cases we need to define a set of
fully internally supported resources.

This is a list of resources I consider essential:
- ncpus
- mem
- vmem
- GPU
- walltime
- cputime

Plus we need some generic resources, that are checked (ie. if job requires 4
kitchen-sinks and node only has 2 available, then the job cannot be run), but
don't have any special semantics.

Support without semantics:
- generic per-node counted resource (counted/enforced only on server)
- generic per-job counted resource (counted/enforced only on server)

* Cgroups - Linux specific *

I have been digging through cgroups docs and the good thing is we can replace a
lot of the Linux stuff with cgroups that should work reliably.

Stuff that cgroups can do:
- memory (mem, vmem, oom killer configuration)
- cpusets
- devices (limiting access)
  - should work well for GPUs or any generic HW requiring dedicated access
- frozen containers
- accounting
Comment 1 Chris Samuel 2010-10-27 23:14:59 MDT
External schedulers - I think you're right for both Moab and Maui, they both
set exec_host.

PPN = processors per node (according to manual page), really virtual processors
as you can overcommit if you are not using cpusets.  I've seen plenty of
commercial software out there that uses them, so I don't think it can go away. 
The pvmem limits which you mention are vital to us.

Different resource limits - I think the current per process and per job limits
make enough sense, it's easy for users to understand.  The only real issue is
that you cannot set a proactively enforced (i.e. malloc fails) limit across a
job as a whole.  But that's enforced by the scheduler anyway (at least with
Maui and Moab).

Resources we need:

pvmem
procs and tpn
walltime
software
nodes and ppn (for commercial software which supports PBS)

Cgroups - I reckon it's a good plan for the future but we need to realise that
it's not going to really arrive for most clusters until RHEL6/CentOS6 starts
getting deployed. Also you cannot have both cpusets and cgroups mounted at the
same time so the current code needs to be refactored/abstracted to be able to
cope with either one being present.

It cannot depend on a feature of cgroups being present but should give you the
benefits if it is.
Comment 2 Simon Toth 2010-10-28 03:56:57 MDT
> External schedulers - I think you're right for both Moab and Maui, they both
> set exec_host.

That would be great.

> PPN = processors per node (according to manual page), really virtual processors
> as you can overcommit if you are not using cpusets.  I've seen plenty of
> commercial software out there that uses them, so I don't think it can go away. 
> The pvmem limits which you mention are vital to us.

Well, that's the problem, then manual page says processors per node, but that's
not how Torque works (this is exactly the reason why I created this bug). They
are processes per node. I'm not saying to get rid of ppn, but to get rid of the
processes semantics, therefore ppn will be actually processors not processes.
pvmem can actually stay, although I think pmem and pvmem can be easily
superseded by mem and vmem.

Plus when you request -l nodes=2:ppn=2:pvmem=2G how much memory do you expect
to get? In the current Torque semantics it is 2*2*2G.

> Different resource limits - I think the current per process and per job limits
> make enough sense, it's easy for users to understand.  The only real issue is
> that you cannot set a proactively enforced (i.e. malloc fails) limit across a
> job as a whole.  But that's enforced by the scheduler anyway (at least with
> Maui and Moab).

The issue is that it is enforced internally by the schedulers. My target is to
make all this work even with qrun. That implies that basic resources like mem,
cpus, etc.. must have a well defined semantic inside Torque itself.

> Cgroups - I reckon it's a good plan for the future but we need to realise that
> it's not going to really arrive for most clusters until RHEL6/CentOS6 starts
> getting deployed. Also you cannot have both cpusets and cgroups mounted at the
> same time so the current code needs to be refactored/abstracted to be able to
> cope with either one being present.
> 
> It cannot depend on a feature of cgroups being present but should give you the
> benefits if it is.

Actually my idea was to create a new cgroups platform (new folder in
src/resmom/).
Comment 3 Ken Nielson 2010-10-28 09:25:06 MDT
> 
> > PPN = processors per node (according to manual page), really virtual processors
> > as you can overcommit if you are not using cpusets.  I've seen plenty of
> > commercial software out there that uses them, so I don't think it can go away. 
> > The pvmem limits which you mention are vital to us.
> 
> Well, that's the problem, then manual page says processors per node, but that's
> not how Torque works (this is exactly the reason why I created this bug). They
> are processes per node. I'm not saying to get rid of ppn, but to get rid of the
> processes semantics, therefore ppn will be actually processors not processes.
> pvmem can actually stay, although I think pmem and pvmem can be easily
> superseded by mem and vmem.

I understand the frustration with ppn not really meaning processors per node.
However, the current behavior of ppn is widely used and expected. We need to
live with this. Changing this behavior will break too many people.
Comment 4 David Singleton 2010-10-28 14:24:05 MDT
(In reply to comment #3)
> > 
> > > PPN = processors per node (according to manual page), really virtual processors
> > > as you can overcommit if you are not using cpusets.  I've seen plenty of
> > > commercial software out there that uses them, so I don't think it can go away. 
> > > The pvmem limits which you mention are vital to us.
> > 
> > Well, that's the problem, then manual page says processors per node, but that's
> > not how Torque works (this is exactly the reason why I created this bug). They
> > are processes per node. I'm not saying to get rid of ppn, but to get rid of the
> > processes semantics, therefore ppn will be actually processors not processes.
> > pvmem can actually stay, although I think pmem and pvmem can be easily
> > superseded by mem and vmem.
> 
> I understand the frustration with ppn not really meaning processors per node.
> However, the current behavior of ppn is widely used and expected. We need to
> live with this. Changing this behavior will break too many people.

In what way are they using it as processes?  Are they requesting the MOM call
setrlimit(RLIMIT_NPROC)?  Are they killing jobs if jobs are detected as having
more than that many processes running on a node?  None of these make any sense
whatsoever (unless some large forkbomb limit is applied - but that should be a
system limit, not a user resource request).  

Is the ppn value being used to impose pvmem or pmem limits some how? I dont see
that in the Torque code?  By external schedulers?  How?

I suspect "processes per node" only really appears in flawed and misleading
documentation, not in real code.
Comment 5 David Beer 2010-10-28 14:40:54 MDT
(In reply to comment #4)
> (In reply to comment #3)
> > > 
> > > > PPN = processors per node (according to manual page), really virtual processors
> > > > as you can overcommit if you are not using cpusets.  I've seen plenty of
> > > > commercial software out there that uses them, so I don't think it can go away. 
> > > > The pvmem limits which you mention are vital to us.
> > > 
> > > Well, that's the problem, then manual page says processors per node, but that's
> > > not how Torque works (this is exactly the reason why I created this bug). They
> > > are processes per node. I'm not saying to get rid of ppn, but to get rid of the
> > > processes semantics, therefore ppn will be actually processors not processes.
> > > pvmem can actually stay, although I think pmem and pvmem can be easily
> > > superseded by mem and vmem.
> > 
> > I understand the frustration with ppn not really meaning processors per node.
> > However, the current behavior of ppn is widely used and expected. We need to
> > live with this. Changing this behavior will break too many people.
> 
> In what way are they using it as processes?  Are they requesting the MOM call
> setrlimit(RLIMIT_NPROC)?  Are they killing jobs if jobs are detected as having
> more than that many processes running on a node?  None of these make any sense
> whatsoever (unless some large forkbomb limit is applied - but that should be a
> system limit, not a user resource request).  
> 
> Is the ppn value being used to impose pvmem or pmem limits some how? I dont see
> that in the Torque code?  By external schedulers?  How?
> 
> I suspect "processes per node" only really appears in flawed and misleading
> documentation, not in real code.

Processes per node is often how it is explained, although you are right, it
isn't restricted in any way to actually limit the number of processes that can
be run. It may have originally been intended to be processors per node, but now
almost all processors intended for computing have multiple cores, making
processors per node completely ambiguous and therefore not very useful.

However, it is in the code in a few ways:

ppn is the number of times that nodename will appear in the $PBS_NODEFILE. This
is intended to be read by the mpi scripts on the program to then make that many
processes. There is nothing in TORQUE that stops the scripts from spawning more
processes though.

ppn is left completely configurable per node, and so the notion that it is tied
to the actual hardware is false. Often in production systems, ppn becomes cores
per node, because that's how many the system admin wants for optimal use. 

The fact of the matter is that ppn hasn't been clearly defined over time, and
what it has become in practice is probably best described as processes per
node. At any rate, changing this behavior would greatly disrupt life for *very*
many TORQUE users.
Comment 6 David Singleton 2010-10-28 15:10:25 MDT
(In reply to comment #5)
> Processes per node is often how it is explained, although you are right, it
> isn't restricted in any way to actually limit the number of processes that can
> be run. It may have originally been intended to be processors per node, but now
> almost all processors intended for computing have multiple cores, making
> processors per node completely ambiguous and therefore not very useful.
> 
> However, it is in the code in a few ways:
> 
> ppn is the number of times that nodename will appear in the $PBS_NODEFILE. This
> is intended to be read by the mpi scripts on the program to then make that many
> processes. There is nothing in TORQUE that stops the scripts from spawning more
> processes though.
> 
> ppn is left completely configurable per node, and so the notion that it is tied
> to the actual hardware is false. Often in production systems, ppn becomes cores
> per node, because that's how many the system admin wants for optimal use. 
> 
> The fact of the matter is that ppn hasn't been clearly defined over time, and
> what it has become in practice is probably best described as processes per
> node. At any rate, changing this behavior would greatly disrupt life for *very*
> many TORQUE users.

As Chris Samuel pointed out, the "p" in "ppn" meant "virtual processors".  A
"virtual processor" can mean a core - for most us that is exactly what it
means.  It can mean an "execution slot" for those sites that set node np
greater than the number of physical cores (or hyperthread contexts).  The
important thing is that it is a characteristic of the hardware/system/site.  It
is not a property of the job.  The number of processes in a job is a property
of a job.  In general there is no alignment. 

If I was to run a 16 thread OpenMP job, what value of ppn do I use?  The OpenMP
app will have 1 process.  But then there will be 2 shells in the job so its
likely to be 3 processes.  So ppn=3 ?  What I actually want is 16 bits of
hardware that each can run a thread without conflict (as much as possible),
i.e. I want 16 virtual processors.  

Yes, the use of the term "processor" needs to be spelt out as above. But at
least it can be made technically accurate. The use of the term "process" cannot
unless you want to turn it into a property of the system.
Comment 7 David Singleton 2010-10-28 15:29:36 MDT
(In reply to comment #6)
> (In reply to comment #5)
> > Processes per node is often how it is explained, although you are right, it
> > isn't restricted in any way to actually limit the number of processes that can
> > be run. It may have originally been intended to be processors per node, but now
> > almost all processors intended for computing have multiple cores, making
> > processors per node completely ambiguous and therefore not very useful.
> > 
> > However, it is in the code in a few ways:
> > 
> > ppn is the number of times that nodename will appear in the $PBS_NODEFILE. This
> > is intended to be read by the mpi scripts on the program to then make that many
> > processes. There is nothing in TORQUE that stops the scripts from spawning more
> > processes though.
> > 
> > ppn is left completely configurable per node, and so the notion that it is tied
> > to the actual hardware is false. Often in production systems, ppn becomes cores
> > per node, because that's how many the system admin wants for optimal use. 
> > 
> > The fact of the matter is that ppn hasn't been clearly defined over time, and
> > what it has become in practice is probably best described as processes per
> > node. At any rate, changing this behavior would greatly disrupt life for *very*
> > many TORQUE users.
> 
> As Chris Samuel pointed out, the "p" in "ppn" meant "virtual processors".  A
> "virtual processor" can mean a core - for most us that is exactly what it
> means.  It can mean an "execution slot" for those sites that set node np
> greater than the number of physical cores (or hyperthread contexts).  The
> important thing is that it is a characteristic of the hardware/system/site.  It
> is not a property of the job.  The number of processes in a job is a property
> of a job.  In general there is no alignment. 
> 
> If I was to run a 16 thread OpenMP job, what value of ppn do I use?  The OpenMP
> app will have 1 process.  But then there will be 2 shells in the job so its
> likely to be 3 processes.  So ppn=3 ?  What I actually want is 16 bits of
> hardware that each can run a thread without conflict (as much as possible),
> i.e. I want 16 virtual processors.  
> 
> Yes, the use of the term "processor" needs to be spelt out as above. But at
> least it can be made technically accurate. The use of the term "process" cannot
> unless you want to turn it into a property of the system.

I'm not sure what change Simon wanted but, just to be clear, this looks like a
purely documentation issue to me. The only thing that has changed since the
"good ol' PBS days" is that someone started documenting "virtual processors" as
"processes" which is very confusing.  As far as I am concerned the behaviour is
OK, just the terminology is totally wrong.  Simon will have to explain what he
sees as the problem.

Note: I am not a Torque user, merely someone who would not like to see
confusion amongst users when using variants of PBS.
Comment 8 Ken Nielson 2010-10-28 15:35:17 MDT
(In reply to comment #6)
> (In reply to comment #5)
> > Processes per node is often how it is explained, although you are right, it
> > isn't restricted in any way to actually limit the number of processes that can
> > be run. It may have originally been intended to be processors per node, but now
> > almost all processors intended for computing have multiple cores, making
> > processors per node completely ambiguous and therefore not very useful.
> > 
> > However, it is in the code in a few ways:
> > 
> > ppn is the number of times that nodename will appear in the $PBS_NODEFILE. This
> > is intended to be read by the mpi scripts on the program to then make that many
> > processes. There is nothing in TORQUE that stops the scripts from spawning more
> > processes though.
> > 
> > ppn is left completely configurable per node, and so the notion that it is tied
> > to the actual hardware is false. Often in production systems, ppn becomes cores
> > per node, because that's how many the system admin wants for optimal use. 
> > 
> > The fact of the matter is that ppn hasn't been clearly defined over time, and
> > what it has become in practice is probably best described as processes per
> > node. At any rate, changing this behavior would greatly disrupt life for *very*
> > many TORQUE users.
> 
> As Chris Samuel pointed out, the "p" in "ppn" meant "virtual processors".  A
> "virtual processor" can mean a core - for most us that is exactly what it
> means.  It can mean an "execution slot" for those sites that set node np
> greater than the number of physical cores (or hyperthread contexts).  The
> important thing is that it is a characteristic of the hardware/system/site.  It
> is not a property of the job.  The number of processes in a job is a property
> of a job.  In general there is no alignment. 
> 
> If I was to run a 16 thread OpenMP job, what value of ppn do I use?  The OpenMP
> app will have 1 process.  But then there will be 2 shells in the job so its
> likely to be 3 processes.  So ppn=3 ?  What I actually want is 16 bits of
> hardware that each can run a thread without conflict (as much as possible),
> i.e. I want 16 virtual processors.  
> 
> Yes, the use of the term "processor" needs to be spelt out as above. But at
> least it can be made technically accurate. The use of the term "process" cannot
> unless you want to turn it into a property of the system.

Maybe it should be called vppn. Believe me, I understand the frustration with
the ambiguity of the name. In essence it comes down to the number of "ppns"
that will be allowed to be scheduled on the node. Come up with another name for
ppn that adequately represents the scheduling limit imposed by the attribute
and we could use that in the documentation. But I think the term ppn and its
syntax is here to stay.
Comment 9 Simon Toth 2010-10-28 17:29:33 MDT
(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > Processes per node is often how it is explained, although you are right, it
> > > isn't restricted in any way to actually limit the number of processes that can
> > > be run. It may have originally been intended to be processors per node, but now
> > > almost all processors intended for computing have multiple cores, making
> > > processors per node completely ambiguous and therefore not very useful.
> > > 
> > > However, it is in the code in a few ways:
> > > 
> > > ppn is the number of times that nodename will appear in the $PBS_NODEFILE. This
> > > is intended to be read by the mpi scripts on the program to then make that many
> > > processes. There is nothing in TORQUE that stops the scripts from spawning more
> > > processes though.
> > > 
> > > ppn is left completely configurable per node, and so the notion that it is tied
> > > to the actual hardware is false. Often in production systems, ppn becomes cores
> > > per node, because that's how many the system admin wants for optimal use. 
> > > 
> > > The fact of the matter is that ppn hasn't been clearly defined over time, and
> > > what it has become in practice is probably best described as processes per
> > > node. At any rate, changing this behavior would greatly disrupt life for *very*
> > > many TORQUE users.
> > 
> > As Chris Samuel pointed out, the "p" in "ppn" meant "virtual processors".  A
> > "virtual processor" can mean a core - for most us that is exactly what it
> > means.  It can mean an "execution slot" for those sites that set node np
> > greater than the number of physical cores (or hyperthread contexts).  The
> > important thing is that it is a characteristic of the hardware/system/site.  It
> > is not a property of the job.  The number of processes in a job is a property
> > of a job.  In general there is no alignment. 
> > 
> > If I was to run a 16 thread OpenMP job, what value of ppn do I use?  The OpenMP
> > app will have 1 process.  But then there will be 2 shells in the job so its
> > likely to be 3 processes.  So ppn=3 ?  What I actually want is 16 bits of
> > hardware that each can run a thread without conflict (as much as possible),
> > i.e. I want 16 virtual processors.  
> > 
> > Yes, the use of the term "processor" needs to be spelt out as above. But at
> > least it can be made technically accurate. The use of the term "process" cannot
> > unless you want to turn it into a property of the system.
> 
> I'm not sure what change Simon wanted but, just to be clear, this looks like a
> purely documentation issue to me. The only thing that has changed since the
> "good ol' PBS days" is that someone started documenting "virtual processors" as
> "processes" which is very confusing.  As far as I am concerned the behaviour is
> OK, just the terminology is totally wrong.  Simon will have to explain what he
> sees as the problem.
> 
> Note: I am not a Torque user, merely someone who would not like to see
> confusion amongst users when using variants of PBS.

It would be awesome if it would be just a documentation issue. Particularly the
node interprets ppn as processes. If you look into the code of the server, it
doesn't really make any difference, but it still creates a sub-node for each
process.

One problem with using ppn as cpus/cores is that when you request pmem or pvmem
or panything you will get ppn*amount, which can counter intuitive.

I personally don't think that per-process resources make much sense these days
(since the number of processes isn't limited by Torque anyway). That includes
per-process resources.

But again either way is OK for me, I just think we should define which way it
works.
Comment 10 Chris Samuel 2010-10-31 23:27:23 MDT
If you are using cpusets then it is processors per node in that your job is
constrained to just the cpus you requested.
Comment 11 Simon Toth 2010-11-01 05:16:10 MDT
(In reply to comment #10)
> If you are using cpusets then it is processors per node in that your job is
> constrained to just the cpus you requested.

Yeah. I'm not talking about what is achievable with the current semantics.
Sure, you can done pretty much everything with the current semantics (and that
is something that has to be maintained). This is more about cleanup and
clarification.
Comment 12 Ken Nielson 2010-12-06 12:15:04 MST
We have made at least the first step in clearing up the confusion around the
meaning of ppn. We have updated the documentation in a couple of places. 

http://www.clusterresources.com/products/torque/docs/1.5nodeconfig.shtml

Also in section 2.1.2 under nodes and ppn:
http://www.clusterresources.com/products/torque/docs/2.1jobsubmission.shtml

I took my definition from David Singleton's comments. They seemed to be the
best explaination.
Comment 13 Glen 2010-12-06 12:31:56 MST
(In reply to comment #5)

> Processes per node is often how it is explained,
...
> The fact of the matter is that ppn hasn't been clearly defined over time, and
> what it has become in practice is probably best described as processes per
> node.

Describing it as "processes per node" is very misleading and completely
inaccurate.  Take for example a multi-threaded program.  I routinely run
multi-threaded code on our cluster.  We have 32 cores per node, and if I run a
_single process_ that uses 32 threads, I request ppn=32.  If that meant
_processes_ I would request ppn=1 because, after all, my mult-threaded program
is still a single process. It is, however, using multiple-cores.

virtual processor per node is the correct definition of ppn - the number of
virtual processors will typically be set to the total number of cores on a
node. redefining it as processes per node will lead to problems.
Comment 14 Ken Nielson 2010-12-06 12:41:50 MST
(In reply to comment #13)
> (In reply to comment #5)
> 
> > Processes per node is often how it is explained,
> ...
> > The fact of the matter is that ppn hasn't been clearly defined over time, and
> > what it has become in practice is probably best described as processes per
> > node.
> 
> Describing it as "processes per node" is very misleading and completely
> inaccurate.  Take for example a multi-threaded program.  I routinely run
> multi-threaded code on our cluster.  We have 32 cores per node, and if I run a
> _single process_ that uses 32 threads, I request ppn=32.  If that meant
> _processes_ I would request ppn=1 because, after all, my mult-threaded program
> is still a single process. It is, however, using multiple-cores.
> 
> virtual processor per node is the correct definition of ppn - the number of
> virtual processors will typically be set to the total number of cores on a
> node. redefining it as processes per node will lead to problems.

Glen,

I double checked the documentation online and I did use the phrase virtual
processor. I tried to be careful not to use the word process or processes.
Comment 15 David Singleton 2010-12-06 13:18:45 MST
(In reply to comment #12)
> We have made at least the first step in clearing up the confusion around the
> meaning of ppn. We have updated the documentation in a couple of places. 
> 
> http://www.clusterresources.com/products/torque/docs/1.5nodeconfig.shtml
> 
> Also in section 2.1.2 under nodes and ppn:
> http://www.clusterresources.com/products/torque/docs/2.1jobsubmission.shtml
> 
> I took my definition from David Singleton's comments. They seemed to be the
> best explaination.

I realised later that what I wrote was not sufficiently precise and it has
turned into this incorrect line in
http://www.clusterresources.com/products/torque/docs/2.1jobsubmission.shtml :

 "The ppn value is a characteristic of the hardware, system, and site, and its
value is to be determined by the administrator."

np is a resource attribute of the system and *its* value is determined by the
administrator.  ppn is a user request (determined by the user) for a quantity
of that system resource attribute.  I think you can just leave out that line.
Comment 16 David Singleton 2010-12-06 13:27:41 MST
(In reply to comment #12)
> 
> Also in section 2.1.2 under nodes and ppn:
> http://www.clusterresources.com/products/torque/docs/2.1jobsubmission.shtml

Hopefully this:

"By default, the node resource is mapped to a virtual node (that is, directly
to a processor, not a full physical compute node). "

is not true.  Hopefully, this is more true (its how our scheduler works at
least):

"By default, the node resource is mapped to a virtual node (that is, multiple
virtual nodes (from one or more jobs) may be allocated to the same physical
node (host) provided that all other resource requests can be satisfied by the
shared physical host). "
Comment 17 Ken Nielson 2010-12-06 14:32:19 MST
(In reply to comment #16)
> (In reply to comment #12)
> > 
> > Also in section 2.1.2 under nodes and ppn:
> > http://www.clusterresources.com/products/torque/docs/2.1jobsubmission.shtml
> 
> Hopefully this:
> 
> "By default, the node resource is mapped to a virtual node (that is, directly
> to a processor, not a full physical compute node). "
> 
> is not true.  Hopefully, this is more true (its how our scheduler works at
> least):
> 
> "By default, the node resource is mapped to a virtual node (that is, multiple
> virtual nodes (from one or more jobs) may be allocated to the same physical
> node (host) provided that all other resource requests can be satisfied by the
> shared physical host). "

I started to add your corrections to the documentation when I realized we have
another item we need to define. That is the host. In context of nodes as a
resource a node is not the same as a host. When we are configuring the nodes
file we are actually configuring execution hosts. When we are requesting nodes
we are requesting parts of each host.

Please add any comments you think are appropriate.
Comment 18 Simon Toth 2010-12-07 02:24:56 MST
> provided that all other resource requests can be satisfied by the
> shared physical host). "

I would skip this part, since there are no other resource requests in Torque.

I would say that the wording is confusing. What about calling it a "slot".
Administrator defines how many slots the machine has and each job can request
multiple slots on multiple machines. This safely throws away any implied
semantics.
Comment 19 Glen 2010-12-07 07:44:51 MST
I posted this in the torquedev thread generated by bugzilla for this bug:


2010/12/6 Michel Béland <michel.beland@rqchp.qc.ca>:

> Later, they introduced -lselect and deprecated -lnodes altogether. Now
> one can ask for -lselect=10:ncpus=8:mpiprocs=2:ompthread=4 to get the
> same result, if I remember correctly, but I think that I liked ppn and
> cpp better...


I remember there was talk from some of the TORQUE developers at
adaptive about adding a "select" statement to TORQUE.  Whatever
happened to that?  I think it would be great if we could add in a
select feature that is compatible (or at least mostly compatible) with
the PBS Pro select.  Maybe Šimon Tóth's work could get us partially
there.
Comment 20 Ken Nielson 2010-12-07 09:03:26 MST
(In reply to comment #18)
> > provided that all other resource requests can be satisfied by the
> > shared physical host). "
> 
> I would skip this part, since there are no other resource requests in Torque.
> 
> I would say that the wording is confusing. What about calling it a "slot".
> Administrator defines how many slots the machine has and each job can request
> multiple slots on multiple machines. This safely throws away any implied
> semantics.

I like the idea of calling this an execution slot. It is generic but also
descriptive of what the purpose of np in the nodes file and ppn in a job
request.
Comment 21 Ken Nielson 2010-12-07 09:26:54 MST
(In reply to comment #19)
> I posted this in the torquedev thread generated by bugzilla for this bug:
> 
> 
> 2010/12/6 Michel Béland <michel.beland@rqchp.qc.ca>:
> 
> > Later, they introduced -lselect and deprecated -lnodes altogether. Now
> > one can ask for -lselect=10:ncpus=8:mpiprocs=2:ompthread=4 to get the
> > same result, if I remember correctly, but I think that I liked ppn and
> > cpp better...
> 
> 
> I remember there was talk from some of the TORQUE developers at
> adaptive about adding a "select" statement to TORQUE.  Whatever
> happened to that?  I think it would be great if we could add in a
> select feature that is compatible (or at least mostly compatible) with
> the PBS Pro select.  Maybe Šimon Tóth's work could get us partially
> there.

I was reminded of this at SC'10. Over the summer when we started looking at
putting select in the TORQUE resource manager we realized this would actually
be best handled by the Scheduler. Even so there seems to be a need to have some
basic support for select at the RM level.