[torqueusers] OpenMPI mpirun problem with TORQUE
이정현
bugslayer at gmail.com
Sun Jan 24 04:18:15 MST 2010
Sure.
As I mentioned, mpirun works correctly. The problem occurs only via torque.
From: Si Hammond [mailto:simon.hammond at gmail.com]
Sent: Sunday, January 24, 2010 8:12 PM
To: 이정현
Cc: Si Hammond; torqueusers at supercluster.org
Subject: Re: [torqueusers] OpenMPI mpirun problem with TORQUE
Can you SSH from one node to the next without passwords etc?
On 23 Jan 2010, at 23:03, 이정현 wrote:
Hi all.
I have little (but serious) problem when submitting a job using mpirun.
There’s no problem with just “1” node (many processors) like below.
(job script)
#!/bin/sh
#PBS -l nodes=1:ppn=2
#PBS -j oe
echo "HOSTNAME : $HOSTNAME"
echo "PBS_NODEFILE = $PBS_NODEFILE"
cat $PBS_NODEFILE
mpirun /home/jhlee/test_program
echo "finish : $(date)"
(result) – test_program just prints message whether it is executed by mpirun or not.
start : Sun Jan 24 07:46:27 KST 2010
HOSTNAME : simulation01
PBS_NODEFILE = /var/spool/torque/aux//31.simulation00
simulation01
simulation01
Detected OpenMPI Runtime Environment
Detected OpenMPI Runtime Environment
finish : Sun Jan 24 07:46:29 KST 2010
But with many nodes like below, mpirun cannot make test_program start.
#PBS -l nodes=2:ppn=2 (other things are same)
I can’t find any process. There’s only mpirun, no ‘test_program’. Please check the ‘ps’ result below.
21680 ? S 0:00 mpirun /home/jhlee/test_program
21684 ? Ss 0:00 bash -c ps ax | grep test
21712 ? R 0:00 grep test
1. mpirun(not via TORQUE) works correctly.
2. OpenMPI was built with –with-tm option.
3. iptables, selinux has been shutdown already. And no password is required to connect other nodes using ssh.
4. OpenMPI 1.4.1, TORQUE 2.4.4
What can I check to solve this ?
Thanks.
-------------------------------------------------------------------------------------------
Jeong-hyun Lee
Visual Simulation Laboratory
Department of Computer Science and Engineering
Dongguk University, Seoul, Korea
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
---------------------------------------------------------------------------------------
Si Hammond
Research & Knowledge Transfer Associate
Performance Modelling, Analysis and Optimisation Team
High Performance Systems Group
Department of Computer Science
University of Warwick, CV4 7AL, UK
http://go.warwick.ac.uk/hpsg
----------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100124/59c8c832/attachment-0001.html
More information about the torqueusers
mailing list