[torqueusers] node memory limiting

Tony Schreiner schreian at bc.edu
Thu Oct 29 08:43:34 MDT 2009


Torque 2.1.10, cluster consists of nodes with 64 GB or RAM, running  
Fedora 10.

There is a job that a user is running recently, that dynamically  
allocates increasing memory over time until all the memory on the node  
is taken. I haven't talked to the developer, but I don't think it's a  
bug (at least inadvertently).  But anyway, at that point the node  
becomes totally unresponsive to Torque or to ssh.

I thought I would set  the max data size in /etc/security/limits.conf  
to 64000000 kb or just below the physical size.

This is effective for ssh logins, but torque connections don't seem to  
honor it. If I do an interactive job on the node and run ulimit -d it  
shows "unlimited". I've rebooted for good measure.

Do I have other options here?

Thanks
Tony Schreiner
Boston College


More information about the torqueusers mailing list