[torqueusers] mpiexec not running on requested # of processors
Mary Ellen Fitzpatrick
mfitzpat at bu.edu
Wed Oct 8 13:53:42 MDT 2008
Hi,
I am having trouble getting mpich2 to use all of the processors on the
number of nodes I specify. I am running torque-2.3.2 and mpich2-1.0.7
on dual-dual core nodes. My nodes files is defined as node1001 np=4,
node1002 np=4, etc. I have started mpd on all of the nodes from the
head node.
In my pbs script, I want my code (simple pi sciprt) to run on 6 nodes
and use all 4 processors (dual-dual core CPUs).
snippet of my pbs script:
#PBS -l nodes=6:ppn=4
# How many procs do I have?
NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
echo Number of processors is $NP
#Run on nodes
mpiexec -np 6 /fs/userB1/mfitzpat/mpi_test
output:
Begin PBS Prologue Wed Oct 8 15:35:40 EDT 2008 1223494540
Job ID: 90.nona-man
Username: mfitzpat
Group: umass
Nodes: node1043 node1044 node1045 node1046 node1047 node1048
Number of processors is 24
Process 0 on node1048
Process 1 on node1047
Process 2 on node1001
Process 3 on node1009
Process 4 on node1026
Process 5 on node1029
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.004853
It says from above the nodes used were node1043-node1048, but it appear
to have run on nodes 1001,1009, 1026, 1029, 1047 and 1048.
Looks like it only ran 6 processes instead of 24.
If I specify 24 instead of 6 in my command: mpiexec -np 6
/fs/userB1/mfitzpat/mpi_test
Then the job hangs.
any ideas where I am making the mistake?
--
Thanks
Mary Ellen
More information about the torqueusers
mailing list