[torqueusers] Is resources_used.mem reliable?

Ake Ake.Sandgren at hpc2n.umu.se
Tue Apr 12 09:11:08 MDT 2005

On Tue, Apr 12, 2005 at 01:21:38AM +0100, Martin Thompson wrote:
> Hi there
> I'm having some problems with the value of resources_used.mem as
> provided by qstat -f.  Should I be able to rely on the accuracy of
> this attribute?
> I find that, in comparison to the output from top, resources_used.mem
> is occasionally correct, but more often than not it reports a much
> smaller value.  This happens with very simple single-cpu jobs and 
> two-cpu OpenMP jobs.  Each of these jobs have very steady memory 
> usage once they get up and running.
> Our cluster is running SLES 9, with a vanilla 2.6.10 kernel, and
> torque-1.1.0p4.  However, I have seen the same problem with 
> torque-1.2.0p2.
> Also, I extracted just enough code from torque-1.2.0p2 to call the 
> resi_proc(int pid) function and it seems to report the correct value
> every time.

This is a difficult area.

Note, the following is Linux specific.

What PBS reports as mem is the sum over all sessions (on all nodes) of
what the kernel reports as resident set size (rss) in /proc/$pid/stat.

This value can't really be trusted for anything (it's just a snapshot of
what happens to be mapped in-core right then),
and especially not for enforcing limits since the kernel doesn't do that.
I.e. ulimit -m will have no impact whatsoever, this goes for both 2.4 and
2.6 kernels.

What I would really want to use is the vmem value since that includes both
data, stack, swapped and anonymous mapped memory (aka malloc)
and can be used for ulimit enforcing (patches currently testing)
The current (p)(v)mem limit scheme in PBS is broken.

But then PBS doesn't report this correctly. It only gives the vmem from
the moder superior.

I am currently testing patches to get that value from the sisters too.

And finally the value will only be counted for processes within the
sessionid that pbs_mom on the node knows about for the job.
There are lots of situations where the sessionid for part of a job will
get changed and PBS will loose track of if.
(Scali MPI for example starts mpi jobs through a separate daemon which
makes PBS totally clueless of the job)

Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: ake at hpc2n.umu.se	Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se

More information about the torqueusers mailing list