[torquedev] cpu use reporting
garrick at speculation.org
garrick at speculation.org
Mon Jun 12 17:22:14 MDT 2006
On Mon, Jun 12, 2006 at 10:53:08AM -0400, Brock Palen alleged:
> We are seeing strange problems with torque-2.1.0p0 backed by
> maui-3.2.6p14.
> Basically a single cpu is being allocated more than once.
> Example
>
> eliza:/usr/local/maui brockp$ qstat -n1 | grep nem064
> 1809.nemesis.engin.u USER cac_seri CsI_4P 6667 1 --
> -- 120:0 R 44:55 nem064/0
> 1871.nemesis.engin.u USER cac_seri Brain 11318 1 --
> -- 120:0 R 31:03 nem064/0
>
> Maui is correctly only putting a two single cpu jobs on the node (2
> cpu nodes) as it just cares about the number of tasks. Its this
> reporting thats bad. These machines appear as free in qmgr,
Don't worry about "free", that only applies to the state reported from
pbs_mom with regards to the node's load average and $ideal_load and
$max_load configs.
> Qmgr: l n nem064
> Node nem064
> state = free
> np = 2
> properties = myrinet
> ntype = cluster
> jobs = 0/1871.nemesis.engin.umich.edu,
> 0/1809.nemesis.engin.umich.edu
It's more bad than just reporting. pbs_server should not allocate the
same CPU to multiple jobs.
Can you send me full server, queue, and job configs? Are you using
job suspend preemption?
More information about the torquedev
mailing list