[torqueusers] Job exceeding memory limits

Steve Young chemadm at hamilton.edu
Fri Apr 6 10:56:44 MDT 2007


Hello,
	I posted to the list about this a while back but still haven't figured
out what is happening here. We are using torque-2.0.0p2. This example is
happening on an Irix cluster but I have seem this on some of our other
architectures too. 
	A gaussian job is submitted requesting 8 cpu's with 7gb of RAM. In the
input file it also states 7gb. Currently, if we just leave out the
request for memory to torque the job will run as expected. However, if I
do request the memory to torque I end up getting the following error:


=>> PBS: job killed: mem 15968747520 exceeded limit 7516192768
Terminated

So now I tried increasing the request to 16gb. Again it terminates with:

=>> PBS: job killed: mem 17789206528 exceeded limit 16106127360
Killed

So I again increase it to 17gb and now it appears to run. 

I realize that torque is doing what it is supposed to but I don't
understand why it believes that the application is using that amount of
memory. Looking at top on the machine I only see:

 SIZE   RES
7266M  523M

for memory of each of the 8 processes running. So how is torque
"thinking" that it needs more than twice as much memory for this job? 

We really would like to be using memory requests to torque but as of yet
I am unable to get past this situation. Maybe Garrick could explain how
torque finds out the amount of memory that the application is using?
Then I can try it from command line to verify it? Thanks in advance for
any advice.

-Steve





More information about the torqueusers mailing list