[torqueusers] OpenMPI and version changed to Torque
knielson at adaptivecomputing.com
Fri Jun 29 09:50:00 MDT 2012
On Fri, Jun 29, 2012 at 9:09 AM, Peter A Ruprecht <
peter.ruprecht at colorado.edu> wrote:
> Hi everyone,
> Currently we're using torque 2.5.11 and would like to migrate to 4.x
> pretty soon. However, some testing with 4.0.2 has shown that programs
> linked against a version of OpenMPI (1.4.x) that was compiled with torque
> 2.5 won't run across more than one node. My guess is that the task
> manager API has changed between 2.5 and 4.0.
The API did not change. But we did require newer versions of autotools to
configure. I wonder if this could be affecting things.
> Certainly, best practices would suggest recompiling all libraries that
> depend on torque when the torque version changes. However, a significant
> number of our users would be very unhappy having to re-test and possibly
> recompile their codes with a recompiled OpenMPI. I think that in some
> cases they are even required to use identical libraries across a whole
> suite of runs to guarantee consistency. This makes it a little tough to
> ever change the resource manager.
> So, getting around to my questions, is it likely that I am understanding
> the dependency between torque, the task manager, and OpenMPI correctly?
> And if so, is it really going to be necessary to recompile OpenMPI? What
> do you all do in this situation? Is it a bad idea to run torque (on a big
> cluster, ~1400 nodes and >10000 jobs/day) without using the task manager?
> Any commentary or pointers to relevant documentation appreciated!
> Pete Ruprecht
> torqueusers mailing list
> torqueusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers