[torqueusers] node memory limiting
Tony Schreiner
schreian at bc.edu
Thu Oct 29 11:21:13 MDT 2009
Somebody else alos offered this off-list. I have tried that but
without success so far.
The existing script has a ulimit -n 32768 which does seem to be
overriding the default value, but when I put either ulimit -d 63000000
or ulimit -m 63000000, neither one of those values seems to be in
effect in the pbs session when I connect, both remain unlimited.
On Oct 29, 2009, at 12:32 PM, Jerry Smith wrote:
> Tony,
>
> Put the ulimit command in your init script for the mom, and the mom
> will inherit those limits.
>
> --Jerry
>
> Tony Schreiner wrote:
>> Torque 2.1.10, cluster consists of nodes with 64 GB or RAM,
>> running Fedora 10.
>>
>> There is a job that a user is running recently, that dynamically
>> allocates increasing memory over time until all the memory on the
>> node is taken. I haven't talked to the developer, but I don't
>> think it's a bug (at least inadvertently). But anyway, at that
>> point the node becomes totally unresponsive to Torque or to ssh.
>>
>> I thought I would set the max data size in /etc/security/
>> limits.conf to 64000000 kb or just below the physical size.
>>
>> This is effective for ssh logins, but torque connections don't seem
>> to honor it. If I do an interactive job on the node and run ulimit
>> -d it shows "unlimited". I've rebooted for good measure.
>>
>> Do I have other options here?
>>
>> Thanks
>> Tony Schreiner
>> Boston College
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
>>
>
More information about the torqueusers
mailing list