[torqueusers] [OFFTOPIC] List of discussion or documentation on infiniband
Greenseid, Joseph M (IS)
Joseph.Greenseid at ngc.com
Wed Jul 8 07:48:28 MDT 2009
While giving Jason's response a "yeah, I think that, too," I would also suggest checking to see if you got your IB stack from your vendor. Some vendors distribute a specialized software/driver set that they tweak to tune specifically to their gear. It's usually based on the OFED stack from Open Fabrics in my experience, but if they've made changes, then you could/should hit them up for support.
From: torqueusers-bounces at supercluster.org on behalf of Jason Williams
Sent: Wed 7/8/2009 9:43 AM
To: ChrisJob.fr at gmail.com
Cc: torqueusers at supercluster.org
Subject: Re: [torqueusers] [OFFTOPIC] List of discussion or documentation on infiniband
One of the major players out there in the Infiniband world is the Open
Fabrics Alliance. (http://www.openfabrics.org <http://www.openfabrics.org/> ). There should be some
docs and mailing lists on the site that you could check out.
Also, you might want to figure out what MPI libraries you are using and
check the website for them.
One last suggestion is to find out who your IB Card and Switch provider
is and maybe get them in on a service call.
To me, it sounds like you are having a problem with your IB Fabric
Subnet Manager. I know some switches out there have this sort of
problem, but I don't want to get too deep into because this is
technically off topic for this list.
> We have an infiniband HPC cluster. Sometimes we have problem with MPI
> programs and we must restart the infiniband. After everything is OK for
> 2 weeks.
> Do you know where I can find a discussion list about infiniband ? Or
> documention on the subject ?
> Thank you for yout help
> torqueusers mailing list
> torqueusers at supercluster.org
torqueusers mailing list
torqueusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers