[torquedev] cpu use reporting
Brock Palen
brockp at umich.edu
Mon Jun 12 08:53:08 MDT 2006
We are seeing strange problems with torque-2.1.0p0 backed by
maui-3.2.6p14.
Basically a single cpu is being allocated more than once.
Example
eliza:/usr/local/maui brockp$ qstat -n1 | grep nem064
1809.nemesis.engin.u USER cac_seri CsI_4P 6667 1 --
-- 120:0 R 44:55 nem064/0
1871.nemesis.engin.u USER cac_seri Brain 11318 1 --
-- 120:0 R 31:03 nem064/0
Maui is correctly only putting a two single cpu jobs on the node (2
cpu nodes) as it just cares about the number of tasks. Its this
reporting thats bad. These machines appear as free in qmgr,
Qmgr: l n nem064
Node nem064
state = free
np = 2
properties = myrinet
ntype = cluster
jobs = 0/1871.nemesis.engin.umich.edu,
0/1809.nemesis.engin.umich.edu
status = opsys=darwin,
uname=Darwin nem064.engin.umich.edu 7.9.0 Darwin
Kernel Version 7.9.0: Wed Mar 30 20:11:17 PST 2005; root:xnu/
xnu-517.12.7.obj~1/RELEASE_PPC Power Macintosh,
sessions=11318 6667
159,nsessions=2,nusers=2,idletime=700,
totmem=2097152kb,availmem=1059788kb,physmem=2097152kb,ncpus=2,
loadave=2.07,netload=2945899383,state=free,
jobs=1809.nemesis.engin.umich.edu
1871.nemesis.engin.umich.edu,
rectime=1150123892
This caused a problem on another cluster when it got full we had many
more jobs on nodes than should have been.
Brock
More information about the torquedev
mailing list