[torqueusers] separating MPI traffic onto fast net

Prakash Velayutham velayups at email.uc.edu
Mon Mar 13 14:11:27 MST 2006


Jeffrey B. Layton wrote:
> I'm not sure how lam works specifically (you can ask on the
> lam mailing list), but in general the network your MPI
> code uses for computation depends upon how you have
> them named (/etc/hosts) and how you specify the nodes
> names in the machine file for the specific MPI.
>
> What does /etc/hosts look like?
>
> Jeff
>
>> Hi,
>>     I've got two networks between the nodes of my cluster and I'd 
>> like to have
>> the MPI traffic on a net to itself. I've created my lam-hostmap.txt 
>> file as indicated
>> in the Install Guide but I'm not sure how I can test to see if it is 
>> actually working.
>> I am concerned that it is not working because when running a job, the 
>> lamd is
>> using the slow 192.0.0.x network instead of the fast 198.0.0.x net:
>>
>> [siervje at node4 ~]$ ps -ef | grep lam
>> siervje  0 16:38 ?  00:00:00 /usr/local/lam/bin/lamd -H 192.0.0.17 -P 
>> 32797 -n 3 -o 0
>>
>> Does this indicate that my mapping for the mpi traffic is not work or 
>> is the lamd
>> concerned an "out-of-band" process that should be running on the 
>> designated slow
>> network?  Is there any way to test to make sure things are working?
>>
>> Thanks!
Also if you use OSC's (Pete's) mpiexec wrapper for LAM, you would be 
able to use a command-line option called --transform-hostname, using 
which you can force the MPI traffic onto the network you want. I think 
this should work for your case.

Prakash


More information about the torqueusers mailing list