[torqueusers] NUMA general use
David.Singleton at anu.edu.au
Mon Apr 26 18:01:10 MDT 2010
On 04/27/2010 03:44 AM, David Beer wrote:
> Hi all,
> We are currently developing a temporary branch (NUMA) to support NUMA systems. (NUMA stands for Non-Uniform Memory Access) We intend for this branch to be temporary and want to merge it back into the main tree with the intention of releasing it in 3.0, when 3.0 comes out.
> In preparation for this, we want to make sure that it is of general use for people running on NUMA systems. The site that is currently running the branch doesn't run MPI jobs on their NUMA system, and they are of the opinion that running an MPI job on this kind of system defeats the purpose of the system. This makes sense to me, and is consistent with the research I have done, but we want to ask to find out how people are using these systems to make sure it is a useful part of TORQUE.
> Right now, as this site isn't interested in MPI jobs, a node that requests 5 nodeboards runs from the first nodeboard and has a cpuset including all of the processors and memory where they are set to run. Do we have TORQUE users out there that are using NUMA systems differently?
> If my questions don't make sense, please feel free to just describe how you are using your system in as much detail (or as little) as you like. We want to make sure this is a feature that is generally useful, not just for a specific case or two.
In reality Altix and UV systems are very good MPI systems - our now
retired Altix cluster (30 64-way nodes) was primarily an MPI workhorse.
It was competitive with current QDR IB on network dominated MPI apps
(3D ffts etc).
My understanding is that UV is a little "more NUMA" than Altix making it
less effective/scalable for NUMA-oblivious shared memory applications.
The added hardware accentuates their suitability for high performance
More information about the torqueusers