[torqueusers] TM interface errors
velayups at email.uc.edu
Fri Apr 7 08:21:29 MDT 2006
On Sat, Apr 01, 2006 at 02:34:50PM -0500, Prakash Velayutham alleged:
>/ David Golden wrote:
/>/ >On 2006-03-31 13:01:47 -0500, Prakash Velayutham wrote:
/>/ >>I have a minimal MPI program to test the TM interface and
/>/ >>strangely I seem to get errors during tm_init call.
/>/ >>Could someone explain what could be wrong? Here is the MPI code:
The purpose of calling TM from within an MPI program is unclear to me.
It seems to me you either have an MPI program, or a TM program. And of
course, MPI launchers using TM is quite handy.
/>/ >Haven't even looked at the code; but:
/>/ >AFAIK there's a 1-client limit in current TM API
/>/ >(see torque planned changes list on wiki).
/>/ >openmpi's run time environment abstraction layer
/>/ >(openrte) would be that client, your user code probably
/>/ >can't be that client.
/>/ Also, when I run that code with TM connections from Mother Superior, it
/>/ works fine. (ie. instead of rank == 1, substitute with rank == 0).
/>/ Wouldn't that also have failed in the above reasoning?
The planned changes are in CVS now. So if you have a recent snapshot,
you can have more than 1 client connected to TM. But I haven't tested
connecting from sister MOMs.
Garrick Staples, Linux/HPCC Administrator
I am working on the dynamic process management support of MPI-2 (Open MPI) and implementing that same level of dynamicity in Torque. Meaning, after an MPI job has started in 'n' CPUs, it should be able to ask for more later on.
This needs that any MOM in the MPI group should be able to talk to MS using TM interface and make MS request more nodes/CPUs from PBS server. I have several of these things implemented (though coarsely), but I did not think that TM would not support it as it is.
Is there a way anyone here can help me extend the TM interface to get this support?
More information about the torqueusers