[torqueusers] TM improvements

Jeff Squyres jsquyres at open-mpi.org
Tue Nov 22 12:32:22 MST 2005


A little birdie tipped me on to Garrick Staples' post from yesterday  
about fixing the long-standing only-1-TM-connection-at-a-time problem:

	http://www.supercluster.org/pipermail/torqueusers/2005-November/ 
002528.html

(I was not a subscriber until a few minutes ago, so I can't reply to  
that message)

First off, many thanks!  This has been a *huge* problem for us for a  
long time.  We had to do some really wonky things to avoid this issue.

As you can probably guess by my e-mail address, I am an MPI  
implementor.  We support the TM interface in both LAM/MPI and Open MPI.

We actually have a few more issues with the TM interface that I have  
passed on to Altair that would significantly help us support TM-based  
systems better; is there any interest here to see our list posted here?  
  (Altair has done some improvements to the TM interface in PBS Pro,  
which is why I passed our list to them)

But before even discussing that list, there's a support issue: LAM and  
Open MPI are now in the unfortunate position of having to support [at  
least] 2 diverging TM implementations that each have different  
characteristics in different versions.  This can be difficult for us to  
support.  Specifically, when someone passes "--with-tm=/opt/wherever"  
to our configure script, how can we know if you support multiple  
simultaneous TM connections or not?

I see the TM_MULTIPLE_CONNS #define in the path from yesterday; I  
assume that this is exactly for this purpose (so that my configure  
script can figure out that a given version of Torque supports the  
multiple TM connection behavior).  That's actually quite perfect,  
except for cross-compiling situations (which I don't see as a problem  
-- I'm not aware of anywhere that we cross-compile for TM support).

Is there going to be a comprehensive list of these #defines that we can  
check for?  Is this list being coordinated with Altair / PBS Pro?  I  
know that they have done some improvements to TM already; do they have  
their own #defines for the new features?  Also, I heard a rumor at some  
point that others were implementing the TM interface (perhaps SGE?  I  
honestly don't remember...).  Is this list of #defines being  
coordinated with them?

As a consumer of the TM interface, it would be *really great* if there  
was only *one* set of these things to check against.  If we have to  
splinter our configure script to check for different vendors and  
different variants, it will be a complete and total nightmare (well,  
more than the nightmare that our configure script already is! ;-) ).

Thanks!

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



More information about the torqueusers mailing list