[torqueusers] why a mpirun job only runs on a single node
xuehai zhang
hai at cs.uchicago.edu
Mon Feb 28 17:45:25 MST 2005
Hi all,
I am a newbie to Torque/PBS. I am sorry if my question is posted in the list earlier or is
problematic itself.
I have a 3-node cluster (1 head node running pbs_server, pbs_sched, and pbs_mom (so it can also run
jobs) and the other two worker nodes runing pbs_mom only). Each node runs Debian Sarge 3.1 and has
mpich package installed. All three nodes is added to the head node's node list (I can access their
information by "pbsnodes -a"). I write the following PBS job submission script. My intention is to
run a sample MPI job (I copy the code from
http://www.iu.edu/%7Erac/hpc/mpi_tutorial/s1_helloworlds.html) on all three nodes. However, it only
runs on the last node added to the node list. Could you please let me know why the job can not run
on the other two nodes? I am not very familar with mpirun, did I use it incorrectly? Should I switch
to use mpiexec instead?
This probloem really puzzled me a lot. I'll appreciate a lot for any help from you.
Thanks.
Xuehai
-----------------begin of the PBS script--------------------------------------
#PBS -l nodes=3:ppn=1
#PBS -l walltime=48:00:00
#PBS -q qsar
#PBS -j oe
#PBS -N myjob2
cd /usr/local/exports
echo " "
echo " "
echo "Job started on `hostname` at `date`"
sleep 2
/usr/bin/mpirun -np 3 /var/tmp/MPI_Tutorial/HelloWorld/helloWorld > myjob2_$HOSTNAME.out
#/usr/bin/mpirun -machinefile ${PBS_NODEFILE} -np 3 /home/globus/MPI_Tutorial/HelloWorld/helloWorld
> myjob2_$HOSTNAME.out
#/usr/bin/mpirun /home/globus/MPI_Tutorial/HelloWorld/helloWorld
echo " "
echo "Job Ended at `date`"
echo " "
-----------------end of the PBS script--------------------------------------
More information about the torqueusers
mailing list