[torqueusers] Torque and OpenMPI

Glen Beane glen.beane at gmail.com
Tue Jan 13 09:00:37 MST 2009


what does your torque script look like?  You should be specifying the
number of nodes.

e.g. something like this:

#!/bin/bash

#PBS -l nodes=2:ppn=4

cd $PBS_O_WORKDIR
mpiexec -n 8 ./a.out

On Tue, Jan 13, 2009 at 6:30 AM, Gijsbert Wiesenekker
<gijsbert.wiesenekker at gmail.com> wrote:
> I have built a two-node Linux cluster with a Quad Core CPU each, running
> Fedora Core 10, Torque and OpenMPI
> I have Torque and OpenMPI working on one node such that when I start a job
> with
> mpiexec -n 4 a.out
> It runs 4 copies of a.out on one node.
> (BTW, I got the error:
> [hostname:01936] [0,0,0] ORTE_ERROR_LOG: File open failure in file
> ras_tm_module.c at line 173
> [hostname:01936] pls:tm: failed to poll for a spawned proc, return status =
> 17002
> [hostname:01936] [0,0,0] ORTE_ERROR_LOG: In errno in file rmgr_urm.c at line
> 462
> [hostname:01936] mpiexec: spawn failed with errno=-11
> After trying all kinds of combinations of server parameters, queue
> parameters and qsub parameters it turned out that I had to add the following
> line to my queue definition: set queue long resources_default.nodes = 1)
>
> My question is how I can configure Torque such that when I start my program
> with
> mpiexec -n 8 a.out
> It starts the job on each node running 4 copies of a.out each.
>
> Regards,
> Gijsbert
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>


More information about the torqueusers mailing list