[torquedev] cpu use reporting

Brock Palen brockp at umich.edu
Mon Jun 12 08:53:08 MDT 2006


We are seeing strange problems with torque-2.1.0p0  backed by  
maui-3.2.6p14.
Basically a single cpu is being allocated more than once.
Example

eliza:/usr/local/maui brockp$ qstat -n1 | grep nem064
1809.nemesis.engin.u USER   cac_seri CsI_4P       6667     1  --     
--  120:0 R 44:55   nem064/0
1871.nemesis.engin.u USER   cac_seri Brain       11318     1  --     
--  120:0 R 31:03   nem064/0

Maui is correctly only putting a two single cpu jobs on the node (2  
cpu nodes)  as it just cares about the number of tasks.  Its this  
reporting thats bad.  These machines appear as free in qmgr,

Qmgr: l n nem064
Node nem064
         state = free
         np = 2
         properties = myrinet
         ntype = cluster
         jobs = 0/1871.nemesis.engin.umich.edu,  
0/1809.nemesis.engin.umich.edu
         status = opsys=darwin,
                  uname=Darwin nem064.engin.umich.edu 7.9.0 Darwin  
Kernel Version 7.9.0: Wed Mar 30 20:11:17 PST 2005; root:xnu/ 
xnu-517.12.7.obj~1/RELEASE_PPC  Power Macintosh,
                  sessions=11318 6667  
159,nsessions=2,nusers=2,idletime=700,
                   
totmem=2097152kb,availmem=1059788kb,physmem=2097152kb,ncpus=2,
                  loadave=2.07,netload=2945899383,state=free,
                  jobs=1809.nemesis.engin.umich.edu  
1871.nemesis.engin.umich.edu,
                  rectime=1150123892



This caused a problem on another cluster when it got full we had many  
more jobs on nodes than should have been.

Brock


More information about the torquedev mailing list