[torqueusers] node memory limiting

Tony Schreiner schreian at bc.edu
Thu Oct 29 11:21:13 MDT 2009


Somebody else alos offered this off-list.  I have tried that but  
without success so far.

The existing script has a  ulimit -n 32768 which does seem to be  
overriding the default value, but when I put either ulimit -d 63000000  
or ulimit -m 63000000, neither one of those values seems to be in  
effect in the pbs session when I connect, both remain unlimited.

On Oct 29, 2009, at 12:32 PM, Jerry Smith wrote:

> Tony,
>
> Put the ulimit command in your init script for the mom, and the mom  
> will inherit those limits.
>
> --Jerry
>
> Tony Schreiner wrote:
>> Torque 2.1.10, cluster consists of nodes with 64 GB or RAM,  
>> running  Fedora 10.
>>
>> There is a job that a user is running recently, that dynamically   
>> allocates increasing memory over time until all the memory on the  
>> node  is taken. I haven't talked to the developer, but I don't  
>> think it's a  bug (at least inadvertently).  But anyway, at that  
>> point the node  becomes totally unresponsive to Torque or to ssh.
>>
>> I thought I would set  the max data size in /etc/security/ 
>> limits.conf  to 64000000 kb or just below the physical size.
>>
>> This is effective for ssh logins, but torque connections don't seem  
>> to  honor it. If I do an interactive job on the node and run ulimit  
>> -d it  shows "unlimited". I've rebooted for good measure.
>>
>> Do I have other options here?
>>
>> Thanks
>> Tony Schreiner
>> Boston College
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
>>
>



More information about the torqueusers mailing list