[torquedev] [Bug 93] Resource management semantics of Torque need to be well defined

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Wed Oct 27 23:14:59 MDT 2010


Chris Samuel <chris at csamuel.org> changed:

           What    |Removed                     |Added
                 CC|                            |chris at csamuel.org

--- Comment #1 from Chris Samuel <chris at csamuel.org> 2010-10-27 23:14:59 MDT ---
External schedulers - I think you're right for both Moab and Maui, they both
set exec_host.

PPN = processors per node (according to manual page), really virtual processors
as you can overcommit if you are not using cpusets.  I've seen plenty of
commercial software out there that uses them, so I don't think it can go away. 
The pvmem limits which you mention are vital to us.

Different resource limits - I think the current per process and per job limits
make enough sense, it's easy for users to understand.  The only real issue is
that you cannot set a proactively enforced (i.e. malloc fails) limit across a
job as a whole.  But that's enforced by the scheduler anyway (at least with
Maui and Moab).

Resources we need:

procs and tpn
nodes and ppn (for commercial software which supports PBS)

Cgroups - I reckon it's a good plan for the future but we need to realise that
it's not going to really arrive for most clusters until RHEL6/CentOS6 starts
getting deployed. Also you cannot have both cpusets and cgroups mounted at the
same time so the current code needs to be refactored/abstracted to be able to
cope with either one being present.

It cannot depend on a feature of cgroups being present but should give you the
benefits if it is.

Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

More information about the torquedev mailing list