[torqueusers] OpenMPI mpirun problem with TORQUE

이정현 bugslayer at gmail.com
Sun Jan 24 04:18:15 MST 2010


As I mentioned, mpirun works correctly. The problem occurs only via torque.


From: Si Hammond [mailto:simon.hammond at gmail.com] 
Sent: Sunday, January 24, 2010 8:12 PM
To: 이정현
Cc: Si Hammond; torqueusers at supercluster.org
Subject: Re: [torqueusers] OpenMPI mpirun problem with TORQUE


Can you SSH from one node to the next without passwords etc?




On 23 Jan 2010, at 23:03, 이정현 wrote:

Hi all.


I have little (but serious) problem when submitting a job using mpirun.


There’s no problem with just “1” node (many processors) like below.


(job script)



#PBS -l nodes=1:ppn=2

#PBS -j oe





mpirun /home/jhlee/test_program

echo "finish : $(date)"



(result) – test_program just prints message whether it is executed by mpirun or not.


start  : Sun Jan 24 07:46:27 KST 2010

HOSTNAME : simulation01

PBS_NODEFILE = /var/spool/torque/aux//31.simulation00



Detected OpenMPI Runtime Environment

Detected OpenMPI Runtime Environment

finish : Sun Jan 24 07:46:29 KST 2010


But with many nodes like below, mpirun cannot make test_program start.


#PBS -l nodes=2:ppn=2 (other things are same)


I can’t find any process. There’s only mpirun, no ‘test_program’. Please check the ‘ps’ result below.


21680 ?        S      0:00 mpirun /home/jhlee/test_program

21684 ?        Ss     0:00 bash -c ps ax | grep test

21712 ?        R      0:00 grep test


1.     mpirun(not via TORQUE) works correctly.

2.     OpenMPI was built with –with-tm option.

3.     iptables, selinux has been shutdown already. And no password is required to connect other nodes using ssh.

4.     OpenMPI 1.4.1, TORQUE 2.4.4


What can I check to solve this ?






Jeong-hyun Lee


Visual Simulation Laboratory

Department of Computer Science and Engineering

Dongguk University, Seoul, Korea


torqueusers mailing list
torqueusers at supercluster.org



Si Hammond


Research & Knowledge Transfer Associate

Performance Modelling, Analysis and Optimisation Team

High Performance Systems Group

Department of Computer Science

University of Warwick, CV4 7AL, UK







-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100124/59c8c832/attachment-0001.html 

More information about the torqueusers mailing list