[torqueusers] separating MPI traffic onto fast net
Jeffrey B. Layton
laytonjb at charter.net
Mon Mar 13 13:53:37 MST 2006
I'm not sure how lam works specifically (you can ask on the
lam mailing list), but in general the network your MPI
code uses for computation depends upon how you have
them named (/etc/hosts) and how you specify the nodes
names in the machine file for the specific MPI.
What does /etc/hosts look like?
> I've got two networks between the nodes of my cluster and I'd like
> to have
> the MPI traffic on a net to itself. I've created my lam-hostmap.txt
> file as indicated
> in the Install Guide but I'm not sure how I can test to see if it is
> actually working.
> I am concerned that it is not working because when running a job, the
> lamd is
> using the slow 192.0.0.x network instead of the fast 198.0.0.x net:
> [siervje at node4 ~]$ ps -ef | grep lam
> siervje 0 16:38 ? 00:00:00 /usr/local/lam/bin/lamd -H 18.104.22.168 -P
> 32797 -n 3 -o 0
> Does this indicate that my mapping for the mpi traffic is not work or
> is the lamd
> concerned an "out-of-band" process that should be running on the
> designated slow
> network? Is there any way to test to make sure things are working?
More information about the torqueusers