[torqueusers] Torque not killing job exceeding memory requested
Laurence Dawson
larry.dawson at vanderbilt.edu
Thu Jan 18 12:15:06 MST 2007
It's running on x86 linux with a 2.4 kernel,
This is an example job
[root at vmpsched root]# qstat -f 1392706 | grep mem
resources_used.mem = 2040216kb
resources_used.vmem = 2654428kb
Resource_List.mem = 1500mb
[root at vmpsched root]# diagnose -j 1392706
JobID State Proc WCLimit User Opsys Class Features
1392706 Running 1 2:07:00:00 yiy1 - all -
WARNING: job '1392706' utilizes more memory than dedicated (1992 > 1500)
As recommended by Seb, a couple of minutes ago I enabled the
RESOURCELIMITPOLICY MEM:ALWAYS:CANCEL,
but so far it is still running...
Troy Baer wrote:
> On Wed, 2007-01-17 at 11:04 -0600, Laurence Dawson wrote:
>
>> A user has two jobs running on a single (dual-dual processor box)
>>
> node.
>
>> It is exceeding the memory he requested, but torque is not killing
>> it...why? Has anyone seen this on their configuration? We are running
>> moab-4.5.0p4 and torque-2.1.0p0.
>>
>
> What OS/architecture? And what does TORQUE report for memory usage vs.
> requested? (I.e. "qstat -f jobid | grep mem")
>
> --Troy
>
More information about the torqueusers
mailing list