[torqueusers] Torque and OpenMPI
gijsbert.wiesenekker at gmail.com
Sat Jan 17 12:58:34 MST 2009
Brett Lee wrote:
> Gijsbert, I believe Glen's suggestion answers your question. Am
> still learning myself, so at the risk of being wrong I'll direct you
> to some MPI/OpenMP examples I've pieced together:
> Glen Beane wrote:
>> what does your torque script look like? You should be specifying the
>> number of nodes.
>> e.g. something like this:
>> #PBS -l nodes=2:ppn=4
>> cd $PBS_O_WORKDIR
>> mpiexec -n 8 ./a.out
>> On Tue, Jan 13, 2009 at 6:30 AM, Gijsbert Wiesenekker
>> <gijsbert.wiesenekker at gmail.com> wrote:
>>> I have built a two-node Linux cluster with a Quad Core CPU each,
>>> Fedora Core 10, Torque and OpenMPI
>>> I have Torque and OpenMPI working on one node such that when I start
>>> a job
>>> mpiexec -n 4 a.out
>>> It runs 4 copies of a.out on one node.
>>> (BTW, I got the error:
>>> [hostname:01936] [0,0,0] ORTE_ERROR_LOG: File open failure in file
>>> ras_tm_module.c at line 173
>>> [hostname:01936] pls:tm: failed to poll for a spawned proc, return
>>> status =
>>> [hostname:01936] [0,0,0] ORTE_ERROR_LOG: In errno in file rmgr_urm.c
>>> at line
>>> [hostname:01936] mpiexec: spawn failed with errno=-11
>>> After trying all kinds of combinations of server parameters, queue
>>> parameters and qsub parameters it turned out that I had to add the
>>> line to my queue definition: set queue long resources_default.nodes
>>> = 1)
>>> My question is how I can configure Torque such that when I start my
>>> mpiexec -n 8 a.out
>>> It starts the job on each node running 4 copies of a.out each.
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>> torqueusers mailing list
>> torqueusers at supercluster.org
I have followed your suggestions, but I am now stuck with another error.
First a couple of other questions:
It looks like I have to configure my batch queue with
resources_default.nodes = 2, otherwise my jobs will stay in the Q state,
is that correct?
mpiexec -n 4 - host first-node a.out : -n 4 -host second-node a.out
works fine after disabling iptables. Which ports does mpiexec require?
When I submit the following job
#PBS -l nodes=2:ppn=4
mpiexec -n 8 a.out
Nothing happens, and after terminating the batch job the error file
PBS: exec of shell '/usr/sbin/pbs_demux' failed.
A Google search suggested to use -nostdin -nostdout, but the Fedora Core
mpiexec does not seem to support those options.
More information about the torqueusers