[torqueusers] why a mpirun job only runs on a single node
Tim Miller
btmiller at helix.nih.gov
Tue Mar 1 09:50:37 MST 2005
On Mon, 28 Feb 2005, xuehai zhang wrote:
> Hi all,
>
> I am a newbie to Torque/PBS. I am sorry if my question is posted in the list earlier or is
> problematic itself.
(snip)
> -----------------begin of the PBS script--------------------------------------
>
> #PBS -l nodes=3:ppn=1
> #PBS -l walltime=48:00:00
> #PBS -q qsar
> #PBS -j oe
> #PBS -N myjob2
>
> cd /usr/local/exports
>
> echo " "
> echo " "
> echo "Job started on `hostname` at `date`"
> sleep 2
> /usr/bin/mpirun -np 3 /var/tmp/MPI_Tutorial/HelloWorld/helloWorld > myjob2_$HOSTNAME.out
> #/usr/bin/mpirun -machinefile ${PBS_NODEFILE} -np 3 /home/globus/MPI_Tutorial/HelloWorld/helloWorld
> > myjob2_$HOSTNAME.out
> #/usr/bin/mpirun /home/globus/MPI_Tutorial/HelloWorld/helloWorld
> echo " "
> echo "Job Ended at `date`"
> echo " "
It looks like in the one mpirun command that isn't commented out you don't
specify a machinefile. This should be set to $PBS_NODEFILE, as it is in
the commented out mpirun. Also, I'm not sure why you're redirecting output
to the myjob2_$HOSTNAME file ... AFAIK you won't get one output file per
host. You may want to use the -o and -e PBS options in your script to
control where output goes.
Best,
Tim
--
Tim Miller
System Administrator -- Laboratory of Computational Biology
National Institutes of Health -- Bldg. 50 Rm. 3309 -- 301-402-0618
More information about the torqueusers
mailing list