[torqueusers] exceeding memory limits?
chemadm at hamilton.edu
Fri Jun 23 09:57:32 MDT 2006
Oh yea.. architecture... well this was on a queue of macintosh (imac's)
machines I have clustered for some small jobs. I could see that if this
job requested more memory than what was available on the machine that it
shouldn't work, I would expect to see the job queue'd forever =). But
the job did get shuffled off to one of the hosts.
I managed to try out a couple things and it seems as though if I don't
request memory for the job and just leave the 200mb request in the
gaussian com file the job appears to be working. I'm going to try some
other jobs as this problem has happened on some of our higher end SMP
So are my previous assumptions true? Is that how I should interpret the
error message from before? (ie. torque thinks the job needs 1.2gb which
exceeds the limit of 200mb the user asked for). Anyhow, thanks in
advance for the help. I'll let you know what I find out after I've done
some more testing. Thanks,
On Fri, 2006-06-23 at 16:54 +0200, Ronny T. Lampert wrote:
> > =>> PBS: job killed: mem 1207959552 exceeded limit 209715200
> > This job had requested 200mb of memory. This is for a gaussian job. What
> > I am trying to understand is what this means. I suspect torque thinks
> > the job actually needs 1.2gb of memory which is exceeding the limit of
> > 200mb that this person requested? I'd like to find more information
> > about how torque allocates/manages memory on nodes. If anyone has more
> > information about this I would be greatly appreciative. Thanks,
> I'm no specialist, but first we also need your architecture+OS here.
> I also don't know (but others can comment on that) if torque also counts the
> shared libs that are mapped. Garrick was working on something IIRC.
> Try to run the job without limits and have a look at "ps auxw", columns VSZ
> (virtual size) and RSS (resident size) on that node and post the results so
> we can see.
More information about the torqueusers