[torqueusers] 4GB resources_used.mem limit

Bernd Schubert bernd-schubert at gmx.de
Thu Jun 30 16:54:31 MDT 2005


Dear Garrick,

many many thanks for your help! 

> I'm not able to test this.  But the first thing you need to do is figure
> out if pbs_mom is reporting the wrong info, or if pbs_server is breaking
> it.

This was my first thought, too. So I looked into the logfiles, but there was 
nothing about this at all. 

>
> You can query this info directly from pbs_mom using momctl or a small util
> I wrote awhile ago called dumpmom
> (http://www-rcf.usc.edu/~garrick/dumpmom.c)
>
> To use momctl, first get the session list, then get the memory usage of
> that session.  Here's an example with a node having 2 sessions, and 1 of
> them is using 100MB.
>
>    $ momctl -q sessions -h hpc0961
>      hpc0961:     sessions = 'sessions=30631 30651'
>    $ momctl -q 'mem[session=30631]' -h hpc0961
>      hpc0961: mem[session=30631] = 'mem[session=30631]=120856kb'
>
> dumpmom is easier for this particular purpose, just do 'dumpmom hpc0961'
> and it will print out lots of similar information.

Thanks, momctl as also your dumpmom are working fine.

>
> If you can verify that pbs_mom is sending the correct info, then we can
> look into pbs_server.

Well, the output of momctl and pbsmom shows, that already pbs_mom is failing.

mem[session=24986]=2115728kb

This value should be 6GB.

Well, I think I just found the reason for the problem, pbs_mom reads the 
memory usage per process in the mem_sum() function of mom_mach.c and uses the 
structure proc_stat_t member vsize there. This vsize variable is defined as  
unsigned, a quick test just showed me, that unsigned is only a 32bit type on 
x86_64. I will correct this to 'unsigned long' (which is 64bit) tomorrow, I'm 
just too tired now.

Thanks again for your help,
	Bernd



-- 
Bernd Schubert
PCI / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: bernd.schubert at pci.uni-heidelberg.de


More information about the torqueusers mailing list