[torqueusers] job exceed memory limit without been killed

Anton Starikov ant.starikov at gmail.com
Thu Mar 18 08:15:52 MDT 2010


Actually, setting this policy in MAUI kills jobs in my case. But I think PBS_MOM has to deal with this limits itself, isn't it the case?


On Mar 18, 2010, at 2:38 PM, Sabuj Pattanayek wrote:

> I would like to know the same, thing. Maui has similar problems even
> with the setting:
> 
> RESOURCELIMITPOLICY             MEM:ALWAYS:CANCEL
> 
> in the log file it'll say that job is being canceled but the job
> doesn't get canceled .
> 
> On Thu, Mar 18, 2010 at 8:26 AM, Anton Starikov <ant.starikov at gmail.com> wrote:
>> Job is scheduled on node with 64GB RAM. memory limit for job is 60GB. At some point job exceed memory limit and crash node. It would be understandable if this happens somewhere in between of two checks by PBS_MOM, but after crash I check what server knows about job and I see:
>> 
>>    Resource_List.mem = 60gb
>>    resources_used.mem = 65557856kb
>> 
>> Which means that PBS_MOM already registered memory usage above limit and even updated this information on server, but didn't react and kill the job.
>> 
>> What can be wrong? Do I miss something in the config?
>> 
>> Anton
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list