[torquedev] cpu use reporting

garrick at speculation.org garrick at speculation.org
Mon Jun 12 17:22:14 MDT 2006


On Mon, Jun 12, 2006 at 10:53:08AM -0400, Brock Palen alleged:
> We are seeing strange problems with torque-2.1.0p0  backed by  
> maui-3.2.6p14.
> Basically a single cpu is being allocated more than once.
> Example
> 
> eliza:/usr/local/maui brockp$ qstat -n1 | grep nem064
> 1809.nemesis.engin.u USER   cac_seri CsI_4P       6667     1  --     
> --  120:0 R 44:55   nem064/0
> 1871.nemesis.engin.u USER   cac_seri Brain       11318     1  --     
> --  120:0 R 31:03   nem064/0
> 
> Maui is correctly only putting a two single cpu jobs on the node (2  
> cpu nodes)  as it just cares about the number of tasks.  Its this  
> reporting thats bad.  These machines appear as free in qmgr,

Don't worry about "free", that only applies to the state reported from
pbs_mom with regards to the node's load average and $ideal_load and
$max_load configs.

 
> Qmgr: l n nem064
> Node nem064
>         state = free
>         np = 2
>         properties = myrinet
>         ntype = cluster
>         jobs = 0/1871.nemesis.engin.umich.edu,  
> 0/1809.nemesis.engin.umich.edu

It's more bad than just reporting.  pbs_server should not allocate the
same CPU to multiple jobs.

Can you send me full server, queue, and job configs?  Are you using
job suspend preemption?



More information about the torquedev mailing list