[torqueusers] TM improvements

Garrick Staples garrick at usc.edu
Tue Nov 22 17:29:47 MST 2005


On Tue, Nov 22, 2005 at 02:32:22PM -0500, Jeff Squyres alleged:
> A little birdie tipped me on to Garrick Staples' post from yesterday  
> about fixing the long-standing only-1-TM-connection-at-a-time problem:
> 
> 	http://www.supercluster.org/pipermail/torqueusers/2005-November/ 
> 002528.html
> 
> (I was not a subscriber until a few minutes ago, so I can't reply to  
> that message)
> 
> First off, many thanks!  This has been a *huge* problem for us for a  
> long time.  We had to do some really wonky things to avoid this issue.
> 
> As you can probably guess by my e-mail address, I am an MPI  
> implementor.  We support the TM interface in both LAM/MPI and Open MPI.

Excellent!  I'm glad you were picked up for this issue.  *Your* feedback
is exact what I'm after,

Did you test the patch yet? :)

 
> We actually have a few more issues with the TM interface that I have  
> passed on to Altair that would significantly help us support TM-based  
> systems better; is there any interest here to see our list posted here?  
>  (Altair has done some improvements to the TM interface in PBS Pro,  
> which is why I passed our list to them)

Absolutely.  We definitely want to help the TM users out there.

 
> But before even discussing that list, there's a support issue: LAM and  
> Open MPI are now in the unfortunate position of having to support [at  
> least] 2 diverging TM implementations that each have different  
> characteristics in different versions.  This can be difficult for us to  
> support.  Specifically, when someone passes "--with-tm=/opt/wherever"  
> to our configure script, how can we know if you support multiple  
> simultaneous TM connections or not?
> 
> I see the TM_MULTIPLE_CONNS #define in the path from yesterday; I  
> assume that this is exactly for this purpose (so that my configure  
> script can figure out that a given version of Torque supports the  
> multiple TM connection behavior).  That's actually quite perfect,  

The idea was certainly for compile-time feature inspection.  I can't say
I gave that aspect of the patch enough thought, but I figured it was
enough to let people get started with testing the patch.


> except for cross-compiling situations (which I don't see as a problem  
> -- I'm not aware of anywhere that we cross-compile for TM support).

*shrug*  I'm open to solutions.

 
> Is there going to be a comprehensive list of these #defines that we can  
> check for?  Is this list being coordinated with Altair / PBS Pro?  I  
> know that they have done some improvements to TM already; do they have  
> their own #defines for the new features?  Also, I heard a rumor at some  
> point that others were implementing the TM interface (perhaps SGE?  I  
> honestly don't remember...).  Is this list of #defines being  
> coordinated with them?
> 
> As a consumer of the TM interface, it would be *really great* if there  
> was only *one* set of these things to check against.  If we have to  
> splinter our configure script to check for different vendors and  
> different variants, it will be a complete and total nightmare (well,  
> more than the nightmare that our configure script already is! ;-) ).

I really can't comment on what other PBS implementations are doing.  I
don't have access to their commercial software, nor would I want to
cause any misunderstandings.  To be honest, I have no idea what kind of
feature-parity we have PBSpro, SGE, etc.  I'm really just focusing on
TORQUE at this time.

But I'm certainly open to maintaining compatibility if someone
contributes the knowledge or patches.

TM has a POSIX specification that I don't want to _break_, but I don't
mind extending.


-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051122/64084e0f/attachment.bin


More information about the torqueusers mailing list