[torqueusers] pbsnodes reporting incorrect "availmem"?

Riccardo Murri rmurri at cscs.ch
Tue Jun 2 04:20:59 MDT 2009


Hello,

We just noticed that our cluster is only running a fraction of the
jobs that it could be running.  We traced it down to MAUI being
convinced that the worker nodes have much less virtual memory than
they actually have.  This in turn depends on "pbsnodes" reporting a
strange "availmem" value:

  $ pbsnodes -a
  [...]
  wn03
       state = free
       np = 16
       properties = lcgpro
       ntype = cluster
       jobs = [...]
  e01.lcg.cscs.ch, 4/1699839.ce01.lcg.cscs.ch
       status = opsys=linux,uname=Linux wn03 2.6.9-78.0.22.ELhugemem #1 SMP Fri May 1 00:50:13 CDT 2009 i686,[...],nsessions=5,nusers=2,idletime=336,totmem=43746664kb,availmem=14258792kb,physmem=33264260kb,ncpus=16,loadave=5.01,netload=3445635636,state=free,jobs=[...],varattr=,rectime=1243937534
  [...]

The availmem=14258792kb has no apparent relation with what system utilities like
"free" display:

  $ ssh wn03 free -k                                                                           
               total       used       free     shared    buffers     cached
  Mem:      33264260   32314932     949328          0      17892    3503884
  -/+ buffers/cache:   28793156    4471104
  Swap:     10482404     729668    9752736

Output from "ps" and "pmap" utilities is consistent with what "free"
is displaying.

How does pbs_mom compute the "availmem" value?  What could be wrong here?

We're using torque 2.3.0 from the gLite distribution on SL4 nodes::

  $ ssh wn01 rpm -qa | fgrep torque      
  torque-2.3.0-snap.200801151629.2cri.slc4
  torque-client-2.3.0-snap.200801151629.2cri.slc4
  torque-mom-2.3.0-snap.200801151629.2cri.slc4

Thank you very much for any suggestion!

Best regards,
Riccardo

-- 
Riccardo Murri
CSCS - Swiss National Centre for Supercomputing
Galleria 2, via Cantonale
CH-6928 Manno (Switzerland)

tel.: +41 91 610 8234
Fax: +41 91 610 8282


More information about the torqueusers mailing list