[torqueusers] OpenMPI mpirun problem with TORQUE

Si Hammond simon.hammond at gmail.com
Sun Jan 24 09:24:53 MST 2010


Out of interest when you specified the --with-tm did you give the configure a directory to find the PBS installation?




On 24 Jan 2010, at 11:18, 이정현 wrote:

> Sure.
> As I mentioned, mpirun works correctly. The problem occurs only via torque.
>  
> From: Si Hammond [mailto:simon.hammond at gmail.com] 
> Sent: Sunday, January 24, 2010 8:12 PM
> To: 이정현
> Cc: Si Hammond; torqueusers at supercluster.org
> Subject: Re: [torqueusers] OpenMPI mpirun problem with TORQUE
>  
> Can you SSH from one node to the next without passwords etc?
>  
>  
>  
> On 23 Jan 2010, at 23:03, 이정현 wrote:
> 
> 
> Hi all.
>  
> I have little (but serious) problem when submitting a job using mpirun.
>  
> There’s no problem with just “1” node (many processors) like below.
>  
> (job script)
>  
> #!/bin/sh
> #PBS -l nodes=1:ppn=2
> #PBS -j oe
>  
> echo "HOSTNAME : $HOSTNAME"
> echo "PBS_NODEFILE = $PBS_NODEFILE"
> cat $PBS_NODEFILE
> mpirun /home/jhlee/test_program
> echo "finish : $(date)"
>  
>  
> (result) – test_program just prints message whether it is executed by mpirun or not.
>  
> start  : Sun Jan 24 07:46:27 KST 2010
> HOSTNAME : simulation01
> PBS_NODEFILE = /var/spool/torque/aux//31.simulation00
> simulation01
> simulation01
> Detected OpenMPI Runtime Environment
> Detected OpenMPI Runtime Environment
> finish : Sun Jan 24 07:46:29 KST 2010
>  
> But with many nodes like below, mpirun cannot make test_program start.
>  
> #PBS -l nodes=2:ppn=2 (other things are same)
>  
> I can’t find any process. There’s only mpirun, no ‘test_program’. Please check the ‘ps’ result below.
>  
> 21680 ?        S      0:00 mpirun /home/jhlee/test_program
> 21684 ?        Ss     0:00 bash -c ps ax | grep test
> 21712 ?        R      0:00 grep test
>  
> 1.     mpirun(not via TORQUE) works correctly.
> 2.     OpenMPI was built with –with-tm option.
> 3.     iptables, selinux has been shutdown already. And no password is required to connect other nodes using ssh.
> 4.     OpenMPI 1.4.1, TORQUE 2.4.4
>  
> What can I check to solve this ?
>  
> Thanks.
>  
> -------------------------------------------------------------------------------------------
>  
> Jeong-hyun Lee
>  
> Visual Simulation Laboratory
> Department of Computer Science and Engineering
> Dongguk University, Seoul, Korea
>  
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>  
> 
> ---------------------------------------------------------------------------------------
> Si Hammond
>  
> Research & Knowledge Transfer Associate
> Performance Modelling, Analysis and Optimisation Team
> High Performance Systems Group
> Department of Computer Science
> University of Warwick, CV4 7AL, UK
> http://go.warwick.ac.uk/hpsg
> ----------------------------------------------------------------------------------------
>  
>  
>  
> 
> 
>  


---------------------------------------------------------------------------------------
Si Hammond

Research & Knowledge Transfer Associate
Performance Modelling, Analysis and Optimisation Team
High Performance Systems Group
Department of Computer Science
University of Warwick, CV4 7AL, UK
http://go.warwick.ac.uk/hpsg
----------------------------------------------------------------------------------------






-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100124/763e9c6f/attachment-0001.html 


More information about the torqueusers mailing list