[torqueusers] Torque and MPI jobs

Albino Aveleda bino at coc.ufrj.br
Fri Nov 10 16:20:19 MST 2006


Hi Ana,

I am using this script and it works fine.

--- script to run with 4 nodes ---
#PBS -l nodes=4
#PBS -l walltime=06:00:00
#PBS -j oe
#PBS -N mpi

# change directory
cd ${PBS_O_WORKDIR}
# get the number of nodes
NUM_NODES=`cat $PBS_NODEFILE | wc -l`
# start MPD on allocated nodes
mpdboot -n ${NUM_NODES} -f ${PBS_NODEFILE} -r rsh
# run on all nodes
mpiexec -n ${NUM_NODES} ./prog
# stop MPD on allocated nodes
mpdallexit

--- end script ---

[]'s,
Bino

Quoting Anna Jonna Armannsdottir <annaj at hi.is>:

> Hi
> this is about Torque and MPICH2.
>
> I am writing a pbs job description that
> starts MPI jobs on a number of machines.
>
> The pbs script specifies 4 nodes with
> 4 processors each and generates a
> PBS_NODEFILE that looks like this
>
> n001
> n001
> n001
> n001
> n002
> n002
> n002
> n002
> n003
> n003
> n003
> n003
> n004
> n004
> n004
> n004
>
> However, the mpiexec needs a file like this
> n001:4
> n002:4
> n003:4
> n004:4
>
> So I wrote a little script that does this.
> So far so good. When the mpdboot starts,
> it ignores one of the nodes and uses the
> masternode instead and refuses to start
> more than 13 processes.
>
> There must be someone that has solved this. :)
>
> --
> Kindest Regards, Anna Jonna Ármannsdóttir,
> Unix System Aministration, Computing Services,
> University of Iceland.
>

__________________________________________________
Albino A. Aveleda                 bino at coc.ufrj.br
Network Manager                   +55 21 2562-8080
PEC-COPPE/UFRJ                    +55 21 2562-8465
Federal University of Rio de Janeiro (UFRJ)


More information about the torqueusers mailing list