[torqueusers] separating MPI traffic onto fast net
velayups at email.uc.edu
Mon Mar 13 14:11:27 MST 2006
Jeffrey B. Layton wrote:
> I'm not sure how lam works specifically (you can ask on the
> lam mailing list), but in general the network your MPI
> code uses for computation depends upon how you have
> them named (/etc/hosts) and how you specify the nodes
> names in the machine file for the specific MPI.
> What does /etc/hosts look like?
>> I've got two networks between the nodes of my cluster and I'd
>> like to have
>> the MPI traffic on a net to itself. I've created my lam-hostmap.txt
>> file as indicated
>> in the Install Guide but I'm not sure how I can test to see if it is
>> actually working.
>> I am concerned that it is not working because when running a job, the
>> lamd is
>> using the slow 192.0.0.x network instead of the fast 198.0.0.x net:
>> [siervje at node4 ~]$ ps -ef | grep lam
>> siervje 0 16:38 ? 00:00:00 /usr/local/lam/bin/lamd -H 126.96.36.199 -P
>> 32797 -n 3 -o 0
>> Does this indicate that my mapping for the mpi traffic is not work or
>> is the lamd
>> concerned an "out-of-band" process that should be running on the
>> designated slow
>> network? Is there any way to test to make sure things are working?
Also if you use OSC's (Pete's) mpiexec wrapper for LAM, you would be
able to use a command-line option called --transform-hostname, using
which you can force the MPI traffic onto the network you want. I think
this should work for your case.
More information about the torqueusers