[torquedev] cpu use reporting
Brock Palen
brockp at umich.edu
Tue Jun 13 08:35:27 MDT 2006
Attached is the output from qmgr -c 'print server' I assume this is
what you mean by full server and queue config?
We do not do suspend or preemption though we plan to in the future,
but not now.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pbs.config
Type: application/octet-stream
Size: 4924 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20060613/1c68eeed/pbs.obj
-------------- next part --------------
Brock Palen
Center for Advanced Computing
brockp at umich.edu
(734)936-1985
On Jun 12, 2006, at 7:22 PM, garrick at speculation.org wrote:
> On Mon, Jun 12, 2006 at 10:53:08AM -0400, Brock Palen alleged:
>> We are seeing strange problems with torque-2.1.0p0 backed by
>> maui-3.2.6p14.
>> Basically a single cpu is being allocated more than once.
>> Example
>>
>> eliza:/usr/local/maui brockp$ qstat -n1 | grep nem064
>> 1809.nemesis.engin.u USER cac_seri CsI_4P 6667 1 --
>> -- 120:0 R 44:55 nem064/0
>> 1871.nemesis.engin.u USER cac_seri Brain 11318 1 --
>> -- 120:0 R 31:03 nem064/0
>>
>> Maui is correctly only putting a two single cpu jobs on the node (2
>> cpu nodes) as it just cares about the number of tasks. Its this
>> reporting thats bad. These machines appear as free in qmgr,
>
> Don't worry about "free", that only applies to the state reported from
> pbs_mom with regards to the node's load average and $ideal_load and
> $max_load configs.
>
>
>> Qmgr: l n nem064
>> Node nem064
>> state = free
>> np = 2
>> properties = myrinet
>> ntype = cluster
>> jobs = 0/1871.nemesis.engin.umich.edu,
>> 0/1809.nemesis.engin.umich.edu
>
> It's more bad than just reporting. pbs_server should not allocate the
> same CPU to multiple jobs.
>
> Can you send me full server, queue, and job configs? Are you using
> job suspend preemption?
>
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
>
>
More information about the torquedev
mailing list