[torquedev] [Bug 93] Resource management semantics of Torque need to be well defined
bugzilla-daemon at supercluster.org
bugzilla-daemon at supercluster.org
Thu Oct 28 14:24:06 MDT 2010
http://www.clusterresources.com/bugzilla/show_bug.cgi?id=93
David Singleton <David.Singleton at anu.edu.au> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |David.Singleton at anu.edu.au
--- Comment #4 from David Singleton <David.Singleton at anu.edu.au> 2010-10-28 14:24:05 MDT ---
(In reply to comment #3)
> >
> > > PPN = processors per node (according to manual page), really virtual processors
> > > as you can overcommit if you are not using cpusets. I've seen plenty of
> > > commercial software out there that uses them, so I don't think it can go away.
> > > The pvmem limits which you mention are vital to us.
> >
> > Well, that's the problem, then manual page says processors per node, but that's
> > not how Torque works (this is exactly the reason why I created this bug). They
> > are processes per node. I'm not saying to get rid of ppn, but to get rid of the
> > processes semantics, therefore ppn will be actually processors not processes.
> > pvmem can actually stay, although I think pmem and pvmem can be easily
> > superseded by mem and vmem.
>
> I understand the frustration with ppn not really meaning processors per node.
> However, the current behavior of ppn is widely used and expected. We need to
> live with this. Changing this behavior will break too many people.
In what way are they using it as processes? Are they requesting the MOM call
setrlimit(RLIMIT_NPROC)? Are they killing jobs if jobs are detected as having
more than that many processes running on a node? None of these make any sense
whatsoever (unless some large forkbomb limit is applied - but that should be a
system limit, not a user resource request).
Is the ppn value being used to impose pvmem or pmem limits some how? I dont see
that in the Torque code? By external schedulers? How?
I suspect "processes per node" only really appears in flawed and misleading
documentation, not in real code.
--
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the torquedev
mailing list