[torquedev] [Bug 93] Resource management semantics of Torque need to be well defined

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Thu Oct 28 15:10:25 MDT 2010


http://www.clusterresources.com/bugzilla/show_bug.cgi?id=93

--- Comment #6 from David Singleton <David.Singleton at anu.edu.au> 2010-10-28 15:10:25 MDT ---
(In reply to comment #5)
> Processes per node is often how it is explained, although you are right, it
> isn't restricted in any way to actually limit the number of processes that can
> be run. It may have originally been intended to be processors per node, but now
> almost all processors intended for computing have multiple cores, making
> processors per node completely ambiguous and therefore not very useful.
> 
> However, it is in the code in a few ways:
> 
> ppn is the number of times that nodename will appear in the $PBS_NODEFILE. This
> is intended to be read by the mpi scripts on the program to then make that many
> processes. There is nothing in TORQUE that stops the scripts from spawning more
> processes though.
> 
> ppn is left completely configurable per node, and so the notion that it is tied
> to the actual hardware is false. Often in production systems, ppn becomes cores
> per node, because that's how many the system admin wants for optimal use. 
> 
> The fact of the matter is that ppn hasn't been clearly defined over time, and
> what it has become in practice is probably best described as processes per
> node. At any rate, changing this behavior would greatly disrupt life for *very*
> many TORQUE users.

As Chris Samuel pointed out, the "p" in "ppn" meant "virtual processors".  A
"virtual processor" can mean a core - for most us that is exactly what it
means.  It can mean an "execution slot" for those sites that set node np
greater than the number of physical cores (or hyperthread contexts).  The
important thing is that it is a characteristic of the hardware/system/site.  It
is not a property of the job.  The number of processes in a job is a property
of a job.  In general there is no alignment. 

If I was to run a 16 thread OpenMP job, what value of ppn do I use?  The OpenMP
app will have 1 process.  But then there will be 2 shells in the job so its
likely to be 3 processes.  So ppn=3 ?  What I actually want is 16 bits of
hardware that each can run a thread without conflict (as much as possible),
i.e. I want 16 virtual processors.  

Yes, the use of the term "processor" needs to be spelt out as above. But at
least it can be made technically accurate. The use of the term "process" cannot
unless you want to turn it into a property of the system.

-- 
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the torquedev mailing list