[torqueusers] OpenMPI mpirun problem with TORQUE

이정현 bugslayer at gmail.com
Sat Jan 23 16:03:15 MST 2010


Hi all.

 

I have little (but serious) problem when submitting a job using mpirun.

 

There’s no problem with just “1” node (many processors) like below.

 

(job script)

 

#!/bin/sh

#PBS -l nodes=1:ppn=2

#PBS -j oe

 

echo "HOSTNAME : $HOSTNAME"

echo "PBS_NODEFILE = $PBS_NODEFILE"

cat $PBS_NODEFILE

mpirun /home/jhlee/test_program

echo "finish : $(date)"

 

 

(result) - test_program just prints message whether it is executed by
mpirun or not. 

 

start  : Sun Jan 24 07:46:27 KST 2010

HOSTNAME : simulation01

PBS_NODEFILE = /var/spool/torque/aux//31.simulation00

simulation01

simulation01

Detected OpenMPI Runtime Environment

Detected OpenMPI Runtime Environment

finish : Sun Jan 24 07:46:29 KST 2010

 

But with many nodes like below, mpirun cannot make test_program start.

 

#PBS -l nodes=2:ppn=2 (other things are same)

 

I can’t find any process. There’s only mpirun, no ‘test_program’.
Please check the ‘ps’ result below.

 

21680 ?        S      0:00 mpirun /home/jhlee/test_program

21684 ?        Ss     0:00 bash -c ps ax | grep test

21712 ?        R      0:00 grep test

 

1.     mpirun(not via TORQUE) works correctly.

2.     OpenMPI was built with -with-tm option.

3.     iptables, selinux has been shutdown already. And no password is
required to connect other nodes using ssh.

4.     OpenMPI 1.4.1, TORQUE 2.4.4

 

What can I check to solve this ?

 

Thanks.

 

----------------------------------------------------------------------------
---------------

 

Jeong-hyun Lee

 

Visual Simulation Laboratory 

Department of Computer Science and Engineering 

Dongguk University, Seoul, Korea 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100124/0b40ec4a/attachment.html 


More information about the torqueusers mailing list