[torquedev] 3.0-alpha branch added to TORQUE subversion tree
knielson at adaptivecomputing.com
Mon Apr 26 11:28:15 MDT 2010
Ken Nielson wrote:
> Christopher Samuel wrote:
>> On 22/04/10 11:17, Ken Nielson wrote:
>>> Currently the two main new features are multi-mom which
>>> allows more than one copy of pbs_mom to run from the same
>>> node and in the same cluster.
>> Interesting, what's the idea behind this ?
>> Looking at the NUMA branch the two appear to be related, is
>> it so that you can partition the NUMA nodes on a large SMP
>> system between the different MOMs ?
> The original purpose of the Multi-MOM was for testing. This was a way to
> make a cluster look larger than the available hardware allowed. So if I
> have 10 machines I can still have a 100 node cluster (or more). However,
> I believe that other uses of the Multi-MOM will come to light as people
> start to use it. For example the NUMA branch. We have partitioned the
> node boards of the SGI 4700 into individual MOMs. In this case a single
> machine with 38 board nodes looks like 38 nodes. Each mom can allocate
> cpu sets on its node board and lock the memory of the node board as
> well. This is still a work in progress. We are finding that different
> sites have different ways of using their resources. Any input from users
> on this is more than welcome.
The last part of my last response was not as clear as I wanted.
We definitely want to get user response about the Multi-MOM, but what I
really would like to get input for is how people are using their NUMA
systems. How do they lock down nodes and memory etc.
More information about the torquedev