[torqueusers] OpenMPI mpirun problem with TORQUE

Si Hammond simon.hammond at gmail.com
Sun Jan 24 04:12:16 MST 2010


Can you SSH from one node to the next without passwords etc?



On 23 Jan 2010, at 23:03, 이정현 wrote:

> Hi all.
>  
> I have little (but serious) problem when submitting a job using mpirun.
>  
> There’s no problem with just “1” node (many processors) like below.
>  
> (job script)
>  
> #!/bin/sh
> #PBS -l nodes=1:ppn=2
> #PBS -j oe
>  
> echo "HOSTNAME : $HOSTNAME"
> echo "PBS_NODEFILE = $PBS_NODEFILE"
> cat $PBS_NODEFILE
> mpirun /home/jhlee/test_program
> echo "finish : $(date)"
>  
>  
> (result) – test_program just prints message whether it is executed by mpirun or not.
>  
> start  : Sun Jan 24 07:46:27 KST 2010
> HOSTNAME : simulation01
> PBS_NODEFILE = /var/spool/torque/aux//31.simulation00
> simulation01
> simulation01
> Detected OpenMPI Runtime Environment
> Detected OpenMPI Runtime Environment
> finish : Sun Jan 24 07:46:29 KST 2010
>  
> But with many nodes like below, mpirun cannot make test_program start.
>  
> #PBS -l nodes=2:ppn=2 (other things are same)
>  
> I can’t find any process. There’s only mpirun, no ‘test_program’. Please check the ‘ps’ result below.
>  
> 21680 ?        S      0:00 mpirun /home/jhlee/test_program
> 21684 ?        Ss     0:00 bash -c ps ax | grep test
> 21712 ?        R      0:00 grep test
>  
> 1.     mpirun(not via TORQUE) works correctly.
> 2.     OpenMPI was built with –with-tm option.
> 3.     iptables, selinux has been shutdown already. And no password is required to connect other nodes using ssh.
> 4.     OpenMPI 1.4.1, TORQUE 2.4.4
>  
> What can I check to solve this ?
>  
> Thanks.
>  
> -------------------------------------------------------------------------------------------
>  
> Jeong-hyun Lee
>  
> Visual Simulation Laboratory
> Department of Computer Science and Engineering
> Dongguk University, Seoul, Korea
>  
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


---------------------------------------------------------------------------------------
Si Hammond

Research & Knowledge Transfer Associate
Performance Modelling, Analysis and Optimisation Team
High Performance Systems Group
Department of Computer Science
University of Warwick, CV4 7AL, UK
http://go.warwick.ac.uk/hpsg
----------------------------------------------------------------------------------------






-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100124/b799a819/attachment-0001.html 


More information about the torqueusers mailing list