On Tue, 4 Sep 2012, Lloyd Brown wrote:

> A lot will depend on what MPI implementation you're using.  Even if
> you're able to transform your nodefiles in some fashion, that would
> imply that you're using the IPoIB components for your main job
> communication.  This means you will have very little latency advantage
> over GigE, and significantly reduced IB bandwidth as well (although
> probably still better than GigE; just not as good as IB can do).  IPoIB
> is a good option for when you have no other option, but it's definitely
> not as good performance as you can get out of IB.
> A much better solution, in my opinion, is to use an MPI implementation
> that can speak native IB verbs directly.  My personal preference is
> OpenMPI, which, if compiled correctly, will find the fastest
> communication medium available (IB before GigE).  And then, despite
> what's in the $PBS_NODEFILE, the job communication will generally go
> over the that fastest network.  Only minimal job setup and status
> information is communicated over GigE.

We have OpenMPI and MVAPICH2 installed on our cluster. It's good to 
know that OpenMPI is most likely already doing the right thing. I'll pass 
this information along to my users.

