[torquedev] 3.0-alpha branch added to TORQUE subversion tree
knielson at adaptivecomputing.com
Mon Apr 26 11:20:45 MDT 2010
Christopher Samuel wrote:
> On 22/04/10 11:17, Ken Nielson wrote:
> > Currently the two main new features are multi-mom which
> > allows more than one copy of pbs_mom to run from the same
> > node and in the same cluster.
> Interesting, what's the idea behind this ?
> Looking at the NUMA branch the two appear to be related, is
> it so that you can partition the NUMA nodes on a large SMP
> system between the different MOMs ?
The original purpose of the Multi-MOM was for testing. This was a way to
make a cluster look larger than the available hardware allowed. So if I
have 10 machines I can still have a 100 node cluster (or more). However,
I believe that other uses of the Multi-MOM will come to light as people
start to use it. For example the NUMA branch. We have partitioned the
node boards of the SGI 4700 into individual MOMs. In this case a single
machine with 38 board nodes looks like 38 nodes. Each mom can allocate
cpu sets on its node board and lock the memory of the node board as
well. This is still a work in progress. We are finding that different
sites have different ways of using their resources. Any input from users
on this is more than welcome.
More information about the torquedev