[torqueusers] OpenMPI mpirun problem with TORQUE
Si Hammond
simon.hammond at gmail.com
Sun Jan 24 04:12:16 MST 2010
Can you SSH from one node to the next without passwords etc?
On 23 Jan 2010, at 23:03, 이정현 wrote:
> Hi all.
>
> I have little (but serious) problem when submitting a job using mpirun.
>
> There’s no problem with just “1” node (many processors) like below.
>
> (job script)
>
> #!/bin/sh
> #PBS -l nodes=1:ppn=2
> #PBS -j oe
>
> echo "HOSTNAME : $HOSTNAME"
> echo "PBS_NODEFILE = $PBS_NODEFILE"
> cat $PBS_NODEFILE
> mpirun /home/jhlee/test_program
> echo "finish : $(date)"
>
>
> (result) – test_program just prints message whether it is executed by mpirun or not.
>
> start : Sun Jan 24 07:46:27 KST 2010
> HOSTNAME : simulation01
> PBS_NODEFILE = /var/spool/torque/aux//31.simulation00
> simulation01
> simulation01
> Detected OpenMPI Runtime Environment
> Detected OpenMPI Runtime Environment
> finish : Sun Jan 24 07:46:29 KST 2010
>
> But with many nodes like below, mpirun cannot make test_program start.
>
> #PBS -l nodes=2:ppn=2 (other things are same)
>
> I can’t find any process. There’s only mpirun, no ‘test_program’. Please check the ‘ps’ result below.
>
> 21680 ? S 0:00 mpirun /home/jhlee/test_program
> 21684 ? Ss 0:00 bash -c ps ax | grep test
> 21712 ? R 0:00 grep test
>
> 1. mpirun(not via TORQUE) works correctly.
> 2. OpenMPI was built with –with-tm option.
> 3. iptables, selinux has been shutdown already. And no password is required to connect other nodes using ssh.
> 4. OpenMPI 1.4.1, TORQUE 2.4.4
>
> What can I check to solve this ?
>
> Thanks.
>
> -------------------------------------------------------------------------------------------
>
> Jeong-hyun Lee
>
> Visual Simulation Laboratory
> Department of Computer Science and Engineering
> Dongguk University, Seoul, Korea
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
---------------------------------------------------------------------------------------
Si Hammond
Research & Knowledge Transfer Associate
Performance Modelling, Analysis and Optimisation Team
High Performance Systems Group
Department of Computer Science
University of Warwick, CV4 7AL, UK
http://go.warwick.ac.uk/hpsg
----------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100124/b799a819/attachment-0001.html
More information about the torqueusers
mailing list