[torqueusers] job only runs on 1 cpu

James A. Peltier jpeltier at cs.sfu.ca
Mon Jul 28 18:13:36 MDT 2008

On Mon, 28 Jul 2008, Jan Dettmer wrote:

> Thanks for the tip.
> I just recompiled with --with-tm.
> Still the same problem.
> #PBS -l nodes=1:ppn=8 will run fine (without -np option in mpiexec command) 
> on 8 cpus on one node.
> #PBS -l nodes=2:ppn=8 will only start on one CPU on one node.
> Cheers, Jan

Another very simple test is to just output the contents of $PBS_NODEFILE 
to see what the PBS job thinks it has been assigned.  If you are seeing 8 
entries for each of the nodes, things should be working OK.  If not 
something isn't being passed correctly.

I would also try running the application standalone using the version of 
Open-MPI that you compiled with --with-tm and ensure it's working 
standalone.  Perhaps add some debugging to the PBS submission scripts too.

something like

#PBS -l nodes=2:ppn=8

echo "Running commands on `hostname`"
echo "Check MPI Information"
echo "MPIEXEC = `which mpiexec`"
echo "MPIRUN = `which mpirun`"
echo "MPICC = `which mpicc`"
echo "MPIC++ = `which mpic++`"
echo "MPIF77 = `which mpif77`"
echo "MPIF90 = `which mpif90`"

echo "My job has been assigned these hosts"

This will give you something to post as well to ensure things are working 
at least somewhat.

James A. Peltier
Systems Analyst (FASNet), VIVARIUM Technical Director
Simon Fraser University - Burnaby Campus
Phone   : 778-782-6573
Fax     : 778-782-3045
Mobile  : 778-840-6434
E-Mail  : jpeltier at sfu.ca
Website : http://www.fas.sfu.ca | http://vivarium.cs.sfu.ca
MSN     : subatomic_spam at hotmail.com

More information about the torqueusers mailing list