[torqueusers] job exceed memory limit without been killed

Anton Starikov ant.starikov at gmail.com
Thu Mar 18 07:26:05 MDT 2010


Job is scheduled on node with 64GB RAM. memory limit for job is 60GB. At some point job exceed memory limit and crash node. It would be understandable if this happens somewhere in between of two checks by PBS_MOM, but after crash I check what server knows about job and I see:

    Resource_List.mem = 60gb
    resources_used.mem = 65557856kb

Which means that PBS_MOM already registered memory usage above limit and even updated this information on server, but didn't react and kill the job.

What can be wrong? Do I miss something in the config?

Anton


More information about the torqueusers mailing list