[torquedev] [Bug 93] Resource management semantics of Torque need to be well defined

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Thu Oct 28 14:24:06 MDT 2010


David Singleton <David.Singleton at anu.edu.au> changed:

           What    |Removed                     |Added
                 CC|                            |David.Singleton at anu.edu.au

--- Comment #4 from David Singleton <David.Singleton at anu.edu.au> 2010-10-28 14:24:05 MDT ---
(In reply to comment #3)
> > 
> > > PPN = processors per node (according to manual page), really virtual processors
> > > as you can overcommit if you are not using cpusets.  I've seen plenty of
> > > commercial software out there that uses them, so I don't think it can go away. 
> > > The pvmem limits which you mention are vital to us.
> > 
> > Well, that's the problem, then manual page says processors per node, but that's
> > not how Torque works (this is exactly the reason why I created this bug). They
> > are processes per node. I'm not saying to get rid of ppn, but to get rid of the
> > processes semantics, therefore ppn will be actually processors not processes.
> > pvmem can actually stay, although I think pmem and pvmem can be easily
> > superseded by mem and vmem.
> I understand the frustration with ppn not really meaning processors per node.
> However, the current behavior of ppn is widely used and expected. We need to
> live with this. Changing this behavior will break too many people.

In what way are they using it as processes?  Are they requesting the MOM call
setrlimit(RLIMIT_NPROC)?  Are they killing jobs if jobs are detected as having
more than that many processes running on a node?  None of these make any sense
whatsoever (unless some large forkbomb limit is applied - but that should be a
system limit, not a user resource request).  

Is the ppn value being used to impose pvmem or pmem limits some how? I dont see
that in the Torque code?  By external schedulers?  How?

I suspect "processes per node" only really appears in flawed and misleading
documentation, not in real code.

Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

More information about the torquedev mailing list