[torqueusers] Transforming node names in $PBS_NODEFILE and $PBS_GPUFILE

Lloyd Brown lloyd_brown at byu.edu
Tue Sep 4 14:02:55 MDT 2012


I can't speak to MVAPICH2, but I have a vague recollection that MVAPICH
wouldn't work unless it was on IB anyway.  I could be misremembering,
though.

A good way to tell is to do a bandwidth test (eg. "osu_bw" from
http://mvapich.cse.ohio-state.edu/benchmarks/), and see what you get.
Generally speaking the bandwidth capabilities are different enough to
make it pretty obvious.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

On 09/04/2012 01:55 PM, Dave Ulrick wrote:
> On Tue, 4 Sep 2012, Lloyd Brown wrote:
> 
>> A lot will depend on what MPI implementation you're using.  Even if
>> you're able to transform your nodefiles in some fashion, that would
>> imply that you're using the IPoIB components for your main job
>> communication.  This means you will have very little latency advantage
>> over GigE, and significantly reduced IB bandwidth as well (although
>> probably still better than GigE; just not as good as IB can do).  IPoIB
>> is a good option for when you have no other option, but it's definitely
>> not as good performance as you can get out of IB.
>>
>> A much better solution, in my opinion, is to use an MPI implementation
>> that can speak native IB verbs directly.  My personal preference is
>> OpenMPI, which, if compiled correctly, will find the fastest
>> communication medium available (IB before GigE).  And then, despite
>> what's in the $PBS_NODEFILE, the job communication will generally go
>> over the that fastest network.  Only minimal job setup and status
>> information is communicated over GigE.
> 
> We have OpenMPI and MVAPICH2 installed on our cluster. It's good to 
> know that OpenMPI is most likely already doing the right thing. I'll pass 
> this information along to my users.
> 
> Thanks,
> Dave
> 


More information about the torqueusers mailing list