[torqueusers] job exceed memory limit without been killed
ant.starikov at gmail.com
Thu Mar 18 08:15:52 MDT 2010
Actually, setting this policy in MAUI kills jobs in my case. But I think PBS_MOM has to deal with this limits itself, isn't it the case?
On Mar 18, 2010, at 2:38 PM, Sabuj Pattanayek wrote:
> I would like to know the same, thing. Maui has similar problems even
> with the setting:
> RESOURCELIMITPOLICY MEM:ALWAYS:CANCEL
> in the log file it'll say that job is being canceled but the job
> doesn't get canceled .
> On Thu, Mar 18, 2010 at 8:26 AM, Anton Starikov <ant.starikov at gmail.com> wrote:
>> Job is scheduled on node with 64GB RAM. memory limit for job is 60GB. At some point job exceed memory limit and crash node. It would be understandable if this happens somewhere in between of two checks by PBS_MOM, but after crash I check what server knows about job and I see:
>> Resource_List.mem = 60gb
>> resources_used.mem = 65557856kb
>> Which means that PBS_MOM already registered memory usage above limit and even updated this information on server, but didn't react and kill the job.
>> What can be wrong? Do I miss something in the config?
>> torqueusers mailing list
>> torqueusers at supercluster.org
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers