[torqueusers] node memory limiting

Jerry Smith jdsmit at sandia.gov
Thu Oct 29 10:32:11 MDT 2009


Tony,

Put the ulimit command in your init script for the mom, and the mom will 
inherit those limits.

--Jerry

Tony Schreiner wrote:
> Torque 2.1.10, cluster consists of nodes with 64 GB or RAM, running  
> Fedora 10.
>
> There is a job that a user is running recently, that dynamically  
> allocates increasing memory over time until all the memory on the node  
> is taken. I haven't talked to the developer, but I don't think it's a  
> bug (at least inadvertently).  But anyway, at that point the node  
> becomes totally unresponsive to Torque or to ssh.
>
> I thought I would set  the max data size in /etc/security/limits.conf  
> to 64000000 kb or just below the physical size.
>
> This is effective for ssh logins, but torque connections don't seem to  
> honor it. If I do an interactive job on the node and run ulimit -d it  
> shows "unlimited". I've rebooted for good measure.
>
> Do I have other options here?
>
> Thanks
> Tony Schreiner
> Boston College
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>   



More information about the torqueusers mailing list