[torqueusers] job exceed memory limit without been killed

Sabuj Pattanayek sabujp at gmail.com
Thu Mar 18 07:38:27 MDT 2010


I would like to know the same, thing. Maui has similar problems even
with the setting:

RESOURCELIMITPOLICY             MEM:ALWAYS:CANCEL

in the log file it'll say that job is being canceled but the job
doesn't get canceled .

On Thu, Mar 18, 2010 at 8:26 AM, Anton Starikov <ant.starikov at gmail.com> wrote:
> Job is scheduled on node with 64GB RAM. memory limit for job is 60GB. At some point job exceed memory limit and crash node. It would be understandable if this happens somewhere in between of two checks by PBS_MOM, but after crash I check what server knows about job and I see:
>
>    Resource_List.mem = 60gb
>    resources_used.mem = 65557856kb
>
> Which means that PBS_MOM already registered memory usage above limit and even updated this information on server, but didn't react and kill the job.
>
> What can be wrong? Do I miss something in the config?
>
> Anton
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>


More information about the torqueusers mailing list