[torquedev] 3.0-alpha branch added to TORQUE subversion tree

Ken Nielson knielson at adaptivecomputing.com
Mon Apr 26 11:28:15 MDT 2010


Ken Nielson wrote:
> Christopher Samuel wrote:
>   
>> On 22/04/10 11:17, Ken Nielson wrote:
>>
>>     
>>> Currently the two main new features are multi-mom which
>>> allows more than one copy of pbs_mom to run from the same
>>> node and in the same cluster.
>>>       
>> Interesting, what's the idea behind this ?
>>
>> Looking at the NUMA branch the two appear to be related, is
>> it so that you can partition the NUMA nodes on a large SMP
>> system between the different MOMs ?
>>
>>
>>     
> The original purpose of the Multi-MOM was for testing. This was a way to 
> make a cluster look larger than the available hardware allowed. So if I 
> have 10 machines I can still have a 100 node cluster (or more). However, 
> I believe that other uses of the Multi-MOM will come to light as people 
> start to use it. For example the NUMA branch. We have partitioned the 
> node boards of the SGI 4700 into individual MOMs. In this case a single 
> machine with 38 board nodes looks like 38 nodes. Each mom can allocate 
> cpu sets on its node board and lock the memory of the node board as 
> well. This is still a work in progress. We are finding that different 
> sites have different ways of using their resources. Any input from users 
> on this is more than welcome.
>
>
>   
The last part of my last response was not as clear as I wanted.

We definitely want to get user response about the Multi-MOM, but what I 
really would like to get input for is how people are using their NUMA 
systems. How do they lock down nodes and memory etc.

Ken Nielson



More information about the torquedev mailing list