[torqueusers] consensus on memory enforcement?

David Golden dgolden at cp.dias.ie
Tue Jun 6 04:28:51 MDT 2006


On 2006-06-02 16:59:16 -0400, garrick at speculation.org wrote:
> I've got users that are abusing memory usage on Linux nodes and I'd
> like to clear this up in TORQUE.
> 
> What's the consensus on what changes should be made?  Linux should use
> RLIMIT_AS and nothing else?  How should the definitions of mem, pmem,
> pvmem, and vmem be clarified?  Ake seems to be our resident expert on
> these matters.
>

Well, on a slightly related note, can I raise the
stack thing?  Until fairly recently, stack limits were adjusted
as part of the mem stuff, IIRC.  That was probably "wrong"
but allowed workaround for stack-hogs without code modification
(e.g. fortran compiled with a certain compiler). While it is
possible to in-job setrlimit to workaround [1], since an in-job
user-reset  ulimit only applies to the mother superior's 
daughter unless you take pains to do [1] in every parallel process, 
(things like mpiexec are spawning via TM), and setting
ulimits of the moms themsleves for inheritance by children applies
to all jobs, not just known-stackhoggy ones:

How about a separate stackmem / pstackmem , that would change
RLIMIT_STACK per-job ?   Note that e.g. aborting the job on overrun
might well be the Wrong Thing to do though - might
be overloading resource tracking (note that RLIMIT_STACK
counts towards RLIMIT_AS) for what should be just a job
attribute. 

[1] http://email.osc.edu/pipermail/mpiexec/2006/000686.html




More information about the torqueusers mailing list