[Mauiusers] mpi job on multi-core nodes, fails to run on multiple
nodes
Bogdan Costescu
Bogdan.Costescu at iwr.uni-heidelberg.de
Mon Nov 3 03:56:26 MST 2008
On Fri, 31 Oct 2008, Mary Ellen Fitzpatrick wrote:
> -l nodes=4:ppn=4, my epilogue/proloque output file say the job ran
> on 4 nodes and requests 16 processors.
You can check by copying/dumping the content of $PBS_NODEFILE that you
get what you ask for. If the result is what you expect, then it's not
a problem with Torque or Maui.
> rank 15 in job 29 node1047_40014 caused collective abort of all ranks
> exit status of rank 15: killed by signal 9
That's most likely a MPI problem, check the nodes' connectivity.
--
Bogdan Costescu
IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.costescu at iwr.uni-heidelberg.de
More information about the mauiusers
mailing list