[torqueusers] node memory limiting
Jerry Smith
jdsmit at sandia.gov
Thu Oct 29 10:32:11 MDT 2009
Tony,
Put the ulimit command in your init script for the mom, and the mom will
inherit those limits.
--Jerry
Tony Schreiner wrote:
> Torque 2.1.10, cluster consists of nodes with 64 GB or RAM, running
> Fedora 10.
>
> There is a job that a user is running recently, that dynamically
> allocates increasing memory over time until all the memory on the node
> is taken. I haven't talked to the developer, but I don't think it's a
> bug (at least inadvertently). But anyway, at that point the node
> becomes totally unresponsive to Torque or to ssh.
>
> I thought I would set the max data size in /etc/security/limits.conf
> to 64000000 kb or just below the physical size.
>
> This is effective for ssh logins, but torque connections don't seem to
> honor it. If I do an interactive job on the node and run ulimit -d it
> shows "unlimited". I've rebooted for good measure.
>
> Do I have other options here?
>
> Thanks
> Tony Schreiner
> Boston College
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
More information about the torqueusers
mailing list