> Yes our test is a collection of codes,  the system we are evaluating  
> has 16 useable cores,  For example if i run a 8 cpu without placement  
> (no dplace)   vs with dplace i see no runtime change over the few  
> hour long run.  I will try and force the system to push processes  

This will change if the machine is full of different user jobs.
Imagine a parallel job ending up on all the even-numbered CPUs while
other jobs occupy the odd-numbered CPUs. Since usually two CPUs share
a common memory bus, performance becomes unpredictable, at least for
memory-bound problems (what else would you run on an altix?). PBSPro
has the nice "shared cpuset" feature that tackles this problem by
assigning only full nodes. That way, a parallel job always has all
CPUs in a locality domain and performance stays predictable.

Another point is that you effectively can't use dplace without
cpusets because there are no logical CPU numbers without a cpuset.
Inside a cpuset that is exclusive to your job, dplace works fine
on logical CPU numbers starting from zero.


