[Mauiusers] mpi job on multi-core nodes, fails to run on multiple nodes

Bogdan Costescu Bogdan.Costescu at iwr.uni-heidelberg.de
Mon Nov 3 03:56:26 MST 2008


On Fri, 31 Oct 2008, Mary Ellen Fitzpatrick wrote:

> -l nodes=4:ppn=4, my epilogue/proloque output file say the job ran 
> on 4 nodes and requests 16 processors.

You can check by copying/dumping the content of $PBS_NODEFILE that you 
get what you ask for. If the result is what you expect, then it's not 
a problem with Torque or Maui.

> rank 15 in job 29  node1047_40014   caused collective abort of all ranks
> exit status of rank 15: killed by signal 9

That's most likely a MPI problem, check the nodes' connectivity.

-- 
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.costescu at iwr.uni-heidelberg.de


More information about the mauiusers mailing list