[torqueusers] why a mpirun job only runs on a single node

xuehai zhang hai at cs.uchicago.edu
Mon Feb 28 17:45:25 MST 2005


Hi all,

I am a newbie to Torque/PBS. I am sorry if my question is posted in the list earlier or is 
problematic itself.

I have a 3-node cluster (1 head node running pbs_server, pbs_sched, and pbs_mom (so it can also run 
jobs) and the other two worker nodes runing pbs_mom only). Each node runs Debian Sarge 3.1 and has 
mpich package installed. All three nodes is added to the head node's node list (I can access their 
information by "pbsnodes -a"). I write the following PBS job submission script. My intention is to 
run a sample MPI job (I copy the code from 
http://www.iu.edu/%7Erac/hpc/mpi_tutorial/s1_helloworlds.html) on all three nodes. However, it only 
runs on the last node added to the node list. Could you please let me know why the job can not run 
on the other two nodes? I am not very familar with mpirun, did I use it incorrectly? Should I switch 
to use mpiexec instead?

This probloem really puzzled me a lot. I'll appreciate a lot for any help from you.

Thanks.

Xuehai

-----------------begin of the PBS script--------------------------------------

#PBS -l nodes=3:ppn=1
#PBS -l walltime=48:00:00
#PBS -q qsar
#PBS -j oe
#PBS -N myjob2

cd /usr/local/exports

echo " "
echo " "
echo "Job started on `hostname` at `date`"
sleep 2
/usr/bin/mpirun -np 3  /var/tmp/MPI_Tutorial/HelloWorld/helloWorld > myjob2_$HOSTNAME.out
#/usr/bin/mpirun -machinefile ${PBS_NODEFILE} -np 3 /home/globus/MPI_Tutorial/HelloWorld/helloWorld 
 > myjob2_$HOSTNAME.out
#/usr/bin/mpirun /home/globus/MPI_Tutorial/HelloWorld/helloWorld
echo " "
echo "Job Ended at `date`"
echo " "

-----------------end of the PBS script--------------------------------------


More information about the torqueusers mailing list