[torqueusers] job exceed memory limit without been killed

David Chin chindw at wfu.edu
Wed Oct 13 09:10:48 MDT 2010


I am just looking through the source for torque-2.5.2, and it does not
seem as if this patch made it in.
torque-2.5.2/src/resmom/linux/mom_mach.c mom_over_limit() only checks
these resources: cput, pcput, vmem, pvmem, walltime.

Cheers,
--Dave

David Chin, Ph.D.
chindw at wfu.edu                  High Performance Computing Systems Analyst
Office: 336-758-2964            Wake Forest University
Mobile: 336-608-0793            Winston-Salem, NC
Email-to-txt: 3366080793 at mms.att.net
Google Talk: chindw at wfu.edu
Web: http://www.wfu.edu/~chindw/
http://www.google.com/profiles/chindw.wfu



On Thu, Mar 18, 2010 at 11:51, Anton Starikov <ant.starikov at gmail.com> wrote:
> Patch to fix this bug is attached.
>
>
>
>
> On Mar 18, 2010, at 4:12 PM, Anton Starikov wrote:
>
>> OK, I've found a bug.
>>
>> normally "mem" limit checked in job_over_limit(). But if there only one node assigned to the job (which is my case, 1 node 16 processes), then it ask to check mom_over_limit() and exits.
>> And mom_over_limit() doesn't check for "mem" limit by obvious reasons.
>>
>>
>> On Mar 18, 2010, at 3:35 PM, Anton Starikov wrote:
>>
>>> Problem here that, if I understand correctly, that MAUI gather this information within scheduling interval, which is normally sufficiently larger than pooling interval of PBM_MOM. And PBS_MOM has to kill job within pooling interval.
>>>
>>>
>>> On Mar 18, 2010, at 3:18 PM, Sabuj Pattanayek wrote:
>>>
>>>> pbs_mom just reports it to your scheduler. Then the scheduler (maui in
>>>> my case) has cancel the job, then pbs_mom kills it. Which doesn't work
>>>> in my case even though maui says it's canceling the job.
>>>>
>>>> On Thu, Mar 18, 2010 at 9:15 AM, Anton Starikov <ant.starikov at gmail.com> wrote:
>>>>> Actually, setting this policy in MAUI kills jobs in my case. But I think PBS_MOM has to deal with this limits itself, isn't it the case?
>>>> _______________________________________________
>>>> torqueusers mailing list
>>>> torqueusers at supercluster.org
>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>
>>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>


More information about the torqueusers mailing list