[torqueusers] Torque not killing job exceeding memory requested

Laurence Dawson larry.dawson at vanderbilt.edu
Thu Jan 18 12:15:06 MST 2007


It's running on x86 linux with a 2.4 kernel,

This is an example job

[root at vmpsched root]# qstat -f 1392706 | grep mem
resources_used.mem = 2040216kb
resources_used.vmem = 2654428kb
Resource_List.mem = 1500mb

[root at vmpsched root]# diagnose -j 1392706
JobID State Proc WCLimit User Opsys Class Features

1392706 Running 1 2:07:00:00 yiy1 - all -
WARNING: job '1392706' utilizes more memory than dedicated (1992 > 1500)

As recommended by Seb, a couple of minutes ago I enabled the 
RESOURCELIMITPOLICY MEM:ALWAYS:CANCEL,

but so far it is still running...





Troy Baer wrote:
> On Wed, 2007-01-17 at 11:04 -0600, Laurence Dawson wrote:
>   
>> A user has two jobs running on a single (dual-dual processor box)
>>     
> node. 
>   
>> It is exceeding the memory he requested, but torque is not killing 
>> it...why? Has anyone seen this on their configuration? We are running 
>> moab-4.5.0p4 and torque-2.1.0p0.
>>     
>
> What OS/architecture?  And what does TORQUE report for memory usage vs.
> requested?  (I.e. "qstat -f jobid | grep mem")
>
> 	--Troy
>   



More information about the torqueusers mailing list