[torqueusers] Torque with OpenMPI

Craig West cwest at astro.umass.edu
Tue Feb 19 12:06:13 MST 2008


You will need to have the torque "clients" and "devel" packages 
installed on the node(s) you are building open-mpi on. I would expect 
that you will only need the torque clients (and mom) packages on the 
nodes that are running the jobs and of course open-mpi installed (from a 
tm enabled build). If you are building and installing open-mpi manually 
on each node, install the torque devel package on each node.
So the answer is yes, you need a tm-enabled open-mpi build on each node.
> I managed to get Torque to work (with Maui scheduler). However, I'm 
> experiencing some problems when trying to get Torque work with openmpi.
> On my server (torque server, maui) I installed openmpi with --with-tm 
> option. Everything went smooth.
> My question is if I need to compile openmpi on my nodes with this 
> option. I tried it, but got errors saying that no tm was found (or 
> so). Of course,
> the installations on server and nodes differ, as I only installed mom 
> on nodes.

It sounds like mpirun is unable to locate your my_app program. You 
should give it the full path to the my_app, or change into the directory 
my_app is located in as part of your torque script and use: mpirun ./my_app
If you are not using NFS (or another form of remote mounting) to mount 
your home directory then you will need to copy the my_app program (and 
associated runtime files) to each of the nodes.
> When I then try to run an mpi job, it runs only locally. When I do this:
> mpirun -np 2 -hostlist my_list my_app
> I get an error, that on machines there is no my_app there. It looks 
> like it is not copied over to other machines. O don't have NFS, but I 
> use password-less ssh.

Also, as Garrick said, you should strip the -np and -hostlist options 
from the mpirun command.


More information about the torqueusers mailing list