[torqueusers] OpenMPI mpirun problem with TORQUE

이정현 bugslayer at gmail.com
Mon Jan 25 02:07:54 MST 2010


Yes, I did.

 

TORQUE

(server) ./configure –prefix=/usr/local/ --with-rcp=scp

(clients) use shell script (generated by make packages in server)

 

OpenMPI

./configure –prefix=/usr/local/openmpi –with-tm=/usr/local/

 

 

From: Si Hammond [mailto:simon.hammond at gmail.com] 
Sent: Monday, January 25, 2010 1:25 AM
To: 이정현
Cc: Si Hammond; torqueusers at supercluster.org
Subject: Re: [torqueusers] OpenMPI mpirun problem with TORQUE

 

Out of interest when you specified the --with-tm did you give the configure a directory to find the PBS installation?

 

 

 

 

On 24 Jan 2010, at 11:18, 이정현 wrote:





Sure.

As I mentioned, mpirun works correctly. The problem occurs only via torque.

 

From: Si Hammond [mailto:simon.hammond at gmail.com] 
Sent: Sunday, January 24, 2010 8:12 PM
To: 이정현
Cc: Si Hammond; torqueusers at supercluster.org
Subject: Re: [torqueusers] OpenMPI mpirun problem with TORQUE

 

Can you SSH from one node to the next without passwords etc?

 

 

 

On 23 Jan 2010, at 23:03, 이정현 wrote:






Hi all.

 

I have little (but serious) problem when submitting a job using mpirun.

 

There’s no problem with just “1” node (many processors) like below.

 

(job script)

 

#!/bin/sh

#PBS -l nodes=1:ppn=2

#PBS -j oe

 

echo "HOSTNAME : $HOSTNAME"

echo "PBS_NODEFILE = $PBS_NODEFILE"

cat $PBS_NODEFILE

mpirun /home/jhlee/test_program

echo "finish : $(date)"

 

 

(result) – test_program just prints message whether it is executed by mpirun or not.

 

start  : Sun Jan 24 07:46:27 KST 2010

HOSTNAME : simulation01

PBS_NODEFILE = /var/spool/torque/aux//31.simulation00

simulation01

simulation01

Detected OpenMPI Runtime Environment

Detected OpenMPI Runtime Environment

finish : Sun Jan 24 07:46:29 KST 2010

 

But with many nodes like below, mpirun cannot make test_program start.

 

#PBS -l nodes=2:ppn=2 (other things are same)

 

I can’t find any process. There’s only mpirun, no ‘test_program’. Please check the ‘ps’ result below.

 

21680 ?        S      0:00 mpirun /home/jhlee/test_program

21684 ?        Ss     0:00 bash -c ps ax | grep test

21712 ?        R      0:00 grep test

 

1.     mpirun(not via TORQUE) works correctly.

2.     OpenMPI was built with –with-tm option.

3.     iptables, selinux has been shutdown already. And no password is required to connect other nodes using ssh.

4.     OpenMPI 1.4.1, TORQUE 2.4.4

 

What can I check to solve this ?

 

Thanks.

 

-------------------------------------------------------------------------------------------

 

Jeong-hyun Lee

 

Visual Simulation Laboratory

Department of Computer Science and Engineering

Dongguk University, Seoul, Korea

 

_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers

 


---------------------------------------------------------------------------------------

Si Hammond

 

Research & Knowledge Transfer Associate

Performance Modelling, Analysis and Optimisation Team

High Performance Systems Group

Department of Computer Science

University of Warwick, CV4 7AL, UK

http://go.warwick.ac.uk/hpsg

----------------------------------------------------------------------------------------

 

 

 






 

 


---------------------------------------------------------------------------------------

Si Hammond

 

Research & Knowledge Transfer Associate

Performance Modelling, Analysis and Optimisation Team

High Performance Systems Group

Department of Computer Science

University of Warwick, CV4 7AL, UK

http://go.warwick.ac.uk/hpsg

----------------------------------------------------------------------------------------

 

 

 





 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100125/43094f65/attachment-0001.html 


More information about the torqueusers mailing list