[torqueusers] Torque with OpenMPI
cwest at astro.umass.edu
Tue Feb 19 12:06:13 MST 2008
You will need to have the torque "clients" and "devel" packages
installed on the node(s) you are building open-mpi on. I would expect
that you will only need the torque clients (and mom) packages on the
nodes that are running the jobs and of course open-mpi installed (from a
tm enabled build). If you are building and installing open-mpi manually
on each node, install the torque devel package on each node.
So the answer is yes, you need a tm-enabled open-mpi build on each node.
> I managed to get Torque to work (with Maui scheduler). However, I'm
> experiencing some problems when trying to get Torque work with openmpi.
> On my server (torque server, maui) I installed openmpi with --with-tm
> option. Everything went smooth.
> My question is if I need to compile openmpi on my nodes with this
> option. I tried it, but got errors saying that no tm was found (or
> so). Of course,
> the installations on server and nodes differ, as I only installed mom
> on nodes.
It sounds like mpirun is unable to locate your my_app program. You
should give it the full path to the my_app, or change into the directory
my_app is located in as part of your torque script and use: mpirun ./my_app
If you are not using NFS (or another form of remote mounting) to mount
your home directory then you will need to copy the my_app program (and
associated runtime files) to each of the nodes.
> When I then try to run an mpi job, it runs only locally. When I do this:
> mpirun -np 2 -hostlist my_list my_app
> I get an error, that on machines there is no my_app there. It looks
> like it is not copied over to other machines. O don't have NFS, but I
> use password-less ssh.
Also, as Garrick said, you should strip the -np and -hostlist options
from the mpirun command.
More information about the torqueusers