[torqueusers] node memory limiting
Tony Schreiner
schreian at bc.edu
Thu Oct 29 08:43:34 MDT 2009
Torque 2.1.10, cluster consists of nodes with 64 GB or RAM, running
Fedora 10.
There is a job that a user is running recently, that dynamically
allocates increasing memory over time until all the memory on the node
is taken. I haven't talked to the developer, but I don't think it's a
bug (at least inadvertently). But anyway, at that point the node
becomes totally unresponsive to Torque or to ssh.
I thought I would set the max data size in /etc/security/limits.conf
to 64000000 kb or just below the physical size.
This is effective for ssh logins, but torque connections don't seem to
honor it. If I do an interactive job on the node and run ulimit -d it
shows "unlimited". I've rebooted for good measure.
Do I have other options here?
Thanks
Tony Schreiner
Boston College
More information about the torqueusers
mailing list