[torqueusers] why a mpirun job only runs on a single node

Tim Miller btmiller at helix.nih.gov
Tue Mar 1 09:50:37 MST 2005


On Mon, 28 Feb 2005, xuehai zhang wrote:

> Hi all,
>
> I am a newbie to Torque/PBS. I am sorry if my question is posted in the list earlier or is
> problematic itself.

(snip)

> -----------------begin of the PBS script--------------------------------------
>
> #PBS -l nodes=3:ppn=1
> #PBS -l walltime=48:00:00
> #PBS -q qsar
> #PBS -j oe
> #PBS -N myjob2
>
> cd /usr/local/exports
>
> echo " "
> echo " "
> echo "Job started on `hostname` at `date`"
> sleep 2
> /usr/bin/mpirun -np 3  /var/tmp/MPI_Tutorial/HelloWorld/helloWorld > myjob2_$HOSTNAME.out
> #/usr/bin/mpirun -machinefile ${PBS_NODEFILE} -np 3 /home/globus/MPI_Tutorial/HelloWorld/helloWorld
>  > myjob2_$HOSTNAME.out
> #/usr/bin/mpirun /home/globus/MPI_Tutorial/HelloWorld/helloWorld
> echo " "
> echo "Job Ended at `date`"
> echo " "

It looks like in the one mpirun command that isn't commented out you don't
specify a machinefile. This should be set to $PBS_NODEFILE, as it is in
the commented out mpirun. Also, I'm not sure why you're redirecting output
to the myjob2_$HOSTNAME file ... AFAIK you won't get one output file per
host. You may want to use the -o and -e PBS options in your script to
control where output goes.

Best,
Tim

-- 
Tim Miller
System Administrator -- Laboratory of Computational Biology
National Institutes of Health   --   Bldg. 50 Rm. 3309    --     301-402-0618


More information about the torqueusers mailing list