[torqueusers] OpenMPI mpirun problem with TORQUE
이정현
bugslayer at gmail.com
Sat Jan 23 16:03:15 MST 2010
Hi all.
I have little (but serious) problem when submitting a job using mpirun.
There’s no problem with just “1” node (many processors) like below.
(job script)
#!/bin/sh
#PBS -l nodes=1:ppn=2
#PBS -j oe
echo "HOSTNAME : $HOSTNAME"
echo "PBS_NODEFILE = $PBS_NODEFILE"
cat $PBS_NODEFILE
mpirun /home/jhlee/test_program
echo "finish : $(date)"
(result) - test_program just prints message whether it is executed by
mpirun or not.
start : Sun Jan 24 07:46:27 KST 2010
HOSTNAME : simulation01
PBS_NODEFILE = /var/spool/torque/aux//31.simulation00
simulation01
simulation01
Detected OpenMPI Runtime Environment
Detected OpenMPI Runtime Environment
finish : Sun Jan 24 07:46:29 KST 2010
But with many nodes like below, mpirun cannot make test_program start.
#PBS -l nodes=2:ppn=2 (other things are same)
I can’t find any process. There’s only mpirun, no ‘test_program’.
Please check the ‘ps’ result below.
21680 ? S 0:00 mpirun /home/jhlee/test_program
21684 ? Ss 0:00 bash -c ps ax | grep test
21712 ? R 0:00 grep test
1. mpirun(not via TORQUE) works correctly.
2. OpenMPI was built with -with-tm option.
3. iptables, selinux has been shutdown already. And no password is
required to connect other nodes using ssh.
4. OpenMPI 1.4.1, TORQUE 2.4.4
What can I check to solve this ?
Thanks.
----------------------------------------------------------------------------
---------------
Jeong-hyun Lee
Visual Simulation Laboratory
Department of Computer Science and Engineering
Dongguk University, Seoul, Korea
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100124/0b40ec4a/attachment.html
More information about the torqueusers
mailing list