[torqueusers] Job exceeding memory limits

Steve Young chemadm at hamilton.edu
Thu Apr 12 11:52:14 MDT 2007


*bump*..... anyone have any ideas how torque checks how much memory an
application is using?  If gaussian is really going outside it's
constraints of 7gb and trying to use more than double that amount I need
to try to verify this from command line. However with the tools I know
how to use I am not seeing this problem. So I'm a little bit stumped as
to how torque is "seeing" this exceeded limit. Thanks,

-Steve



 
On Fri, 2007-04-06 at 12:56 -0400, Steve Young wrote:
> Hello,
> 	I posted to the list about this a while back but still haven't figured
> out what is happening here. We are using torque-2.0.0p2. This example is
> happening on an Irix cluster but I have seem this on some of our other
> architectures too. 
> 	A gaussian job is submitted requesting 8 cpu's with 7gb of RAM. In the
> input file it also states 7gb. Currently, if we just leave out the
> request for memory to torque the job will run as expected. However, if I
> do request the memory to torque I end up getting the following error:
> 
> 
> =>> PBS: job killed: mem 15968747520 exceeded limit 7516192768
> Terminated
> 
> So now I tried increasing the request to 16gb. Again it terminates with:
> 
> =>> PBS: job killed: mem 17789206528 exceeded limit 16106127360
> Killed
> 
> So I again increase it to 17gb and now it appears to run. 
> 
> I realize that torque is doing what it is supposed to but I don't
> understand why it believes that the application is using that amount of
> memory. Looking at top on the machine I only see:
> 
>  SIZE   RES
> 7266M  523M
> 
> for memory of each of the 8 processes running. So how is torque
> "thinking" that it needs more than twice as much memory for this job? 
> 
> We really would like to be using memory requests to torque but as of yet
> I am unable to get past this situation. Maybe Garrick could explain how
> torque finds out the amount of memory that the application is using?
> Then I can try it from command line to verify it? Thanks in advance for
> any advice.
> 
> -Steve
> 
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list