[torqueusers] TM improvements
Jeff Squyres
jsquyres at open-mpi.org
Tue Nov 22 12:32:22 MST 2005
A little birdie tipped me on to Garrick Staples' post from yesterday
about fixing the long-standing only-1-TM-connection-at-a-time problem:
http://www.supercluster.org/pipermail/torqueusers/2005-November/
002528.html
(I was not a subscriber until a few minutes ago, so I can't reply to
that message)
First off, many thanks! This has been a *huge* problem for us for a
long time. We had to do some really wonky things to avoid this issue.
As you can probably guess by my e-mail address, I am an MPI
implementor. We support the TM interface in both LAM/MPI and Open MPI.
We actually have a few more issues with the TM interface that I have
passed on to Altair that would significantly help us support TM-based
systems better; is there any interest here to see our list posted here?
(Altair has done some improvements to the TM interface in PBS Pro,
which is why I passed our list to them)
But before even discussing that list, there's a support issue: LAM and
Open MPI are now in the unfortunate position of having to support [at
least] 2 diverging TM implementations that each have different
characteristics in different versions. This can be difficult for us to
support. Specifically, when someone passes "--with-tm=/opt/wherever"
to our configure script, how can we know if you support multiple
simultaneous TM connections or not?
I see the TM_MULTIPLE_CONNS #define in the path from yesterday; I
assume that this is exactly for this purpose (so that my configure
script can figure out that a given version of Torque supports the
multiple TM connection behavior). That's actually quite perfect,
except for cross-compiling situations (which I don't see as a problem
-- I'm not aware of anywhere that we cross-compile for TM support).
Is there going to be a comprehensive list of these #defines that we can
check for? Is this list being coordinated with Altair / PBS Pro? I
know that they have done some improvements to TM already; do they have
their own #defines for the new features? Also, I heard a rumor at some
point that others were implementing the TM interface (perhaps SGE? I
honestly don't remember...). Is this list of #defines being
coordinated with them?
As a consumer of the TM interface, it would be *really great* if there
was only *one* set of these things to check against. If we have to
splinter our configure script to check for different vendors and
different variants, it will be a complete and total nightmare (well,
more than the nightmare that our configure script already is! ;-) ).
Thanks!
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
More information about the torqueusers
mailing list