[torqueusers] mpiexec not running on requested # of processors

Mary Ellen Fitzpatrick mfitzpat at bu.edu
Wed Oct 8 13:53:42 MDT 2008

I am having trouble getting mpich2 to use all of the processors on the 
number of nodes I specify.  I am running torque-2.3.2 and mpich2-1.0.7 
on dual-dual core nodes.  My nodes files is defined as node1001 np=4, 
node1002 np=4, etc.  I have started mpd on all of the nodes from the 
head node.

In my pbs script, I want my code (simple pi sciprt) to run on 6 nodes 
and use all 4 processors (dual-dual core CPUs).
snippet of my pbs script:
#PBS -l nodes=6:ppn=4
# How many procs do I have?
NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
echo Number of processors is $NP
#Run on nodes
mpiexec -np 6 /fs/userB1/mfitzpat/mpi_test

Begin PBS Prologue Wed Oct  8 15:35:40 EDT 2008 1223494540
Job ID:         90.nona-man
Username:       mfitzpat
Group:          umass
Nodes:          node1043 node1044 node1045 node1046 node1047 node1048

Number of processors is 24
Process 0 on node1048
Process 1 on node1047
Process 2 on node1001
Process 3 on node1009
Process 4 on node1026
Process 5 on node1029
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.004853

It says from above the nodes used were node1043-node1048, but it appear 
to have run on nodes 1001,1009, 1026, 1029, 1047 and 1048.
Looks like it only ran 6 processes instead of 24.  

If I specify 24 instead of 6 in my command: mpiexec -np 6 
Then the job hangs.

any ideas where I am making the mistake?

Mary Ellen

