[torqueusers] Torque not killing job exceeding memory requested

Åke Sandgren ake.sandgren at hpc2n.umu.se
Sat Jan 20 05:04:57 MST 2007


On Fri, 2007-01-19 at 15:47 -0600, Laurence Dawson wrote:
> >   
> Is this a problem since a particular version? - the email Gabe Turner 
> sent indicates this is not happening in 2.1.6 (at least for him). The 
> comment Troy quoted is not the same as the one in the version of torque 
> I am  running(2.1.0p0), but there is a section of commented out code 
> that looks like this:
> 
>       /* UMU vmem patch sets RLIMIT_AS rather than RLIMIT_DATA and 
> RLIMIT_STACK */
>  
>       /*
>       reslim.rlim_cur = reslim.rlim_max = mem_limit;
>  
>       if (setrlimit(RLIMIT_DATA,&reslim) < 0)
>         {
>         return(error("RLIMIT_DATA",PBSE_SYSTEM));
>         }
>  
>       if (setrlimit(RLIMIT_STACK,&reslim) < 0)
>         {
>         return(error("RLIMIT_STACK",PBSE_SYSTEM));
>         }
>       */

As far as i can remember OpenPBS/Torque has never killed jobs that where
over their (p)mem limit. The code has always checked vsize (which
corresponds to vmem). What it did try to do before my patch above was to
set RLIMIT_DATA which at least in linux:es from 2.4 and up is almost
never used inside the kernel. Hence the change to letting the kernel
handle RLIMIT_AS (pvmem) which it does perfectly.

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: ake at hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se



More information about the torqueusers mailing list