[torqueusers] job exceed memory limit without been killed
ant.starikov at gmail.com
Thu Mar 18 09:12:25 MDT 2010
OK, I've found a bug.
normally "mem" limit checked in job_over_limit(). But if there only one node assigned to the job (which is my case, 1 node 16 processes), then it ask to check mom_over_limit() and exits.
And mom_over_limit() doesn't check for "mem" limit by obvious reasons.
On Mar 18, 2010, at 3:35 PM, Anton Starikov wrote:
> Problem here that, if I understand correctly, that MAUI gather this information within scheduling interval, which is normally sufficiently larger than pooling interval of PBM_MOM. And PBS_MOM has to kill job within pooling interval.
> On Mar 18, 2010, at 3:18 PM, Sabuj Pattanayek wrote:
>> pbs_mom just reports it to your scheduler. Then the scheduler (maui in
>> my case) has cancel the job, then pbs_mom kills it. Which doesn't work
>> in my case even though maui says it's canceling the job.
>> On Thu, Mar 18, 2010 at 9:15 AM, Anton Starikov <ant.starikov at gmail.com> wrote:
>>> Actually, setting this policy in MAUI kills jobs in my case. But I think PBS_MOM has to deal with this limits itself, isn't it the case?
>> torqueusers mailing list
>> torqueusers at supercluster.org
More information about the torqueusers