[torqueusers] mpi libraries not being loaded with torque

Donald Tripp dtripp at hawaii.edu
Mon Sep 10 15:33:13 MDT 2007


In my experience, its best to put these kinds of configuration  
changes in the /etc/profile.d folder, so that it gets loaded  
globally. If your users need individual small changes, then  
the .bashrc file should suffice.


- Donald Tripp
  dtripp at hawaii.edu
----------------------------------------------
HPC Systems Administrator
High Performance Computing Center
University of Hawai'i at Hilo
200 W. Kawili Street
Hilo,   Hawaii   96720
http://www.hpc.uhh.hawaii.edu


On Sep 10, 2007, at 11:25 AM, Adams, Samuel D Contr AFRL/HEDR wrote:

> I am trying to make my new cluster flexible such that it can run with
> more than one configuration at the same time.  For example, you can
> choose gcc, pg, or Intel compilers using OpenMPI.  To start out  
> with, I
> am just using gcc 4.1 that comes with RHEL5 and OpenMPI.  For some
> reason, I am having trouble with the way it is loading the libraries
> depending on how I run the job.  Basically it would seem that the
> LD_LIBRARY_PATH is not set properly depending one how I run the  
> job; it
> works interactively but not with torque.
>
> I have this set in my .bashrc file in the root of my home directory
>
> if [ `hostname | grep "prod"` ]; then
>         PATH=/usr/local/profiles/gcc-openmpi/bin/:$PATH
>
> LD_LIBRARY_PATH=/usr/local/profiles/gcc-openmpi/lib/:$LD_LIBRARY_PATH
> fi
>
> So, theroretically this should set the PATH and LD_LIBRARY_PATH  
> properly
> whenever I open a shell.
>
> First I tired to submit a job with torque with a script something like
> this:
>
> !/bin/bash
> #PBS -l nodes=2:ppn=8
>
> `which mpirun` --prefix /usr/local/profiles/gcc-openmpi/  
> program_to_run
> exit 0
>
> As you can see, I tried everything I could think of to get around  
> it not
> finding the libraries, but it was to no avail.  This is the error I
> invariably get:
>
> [sam at prodnode1 fdtd_0.3]$ cat script.sh.e223
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
>
> As a test, I ran:
>
> [sam at prodnode1 fdtd_0.3]$ echo "echo $LD_LIBRARY_PATH" | qsub
>
> And I got, which seems to be what I would expect:
>
> [sam at prodnode1 fdtd_0.3]$ cat STDIN.o226
> /usr/local/profiles/gcc-openmpi/lib/:
>
>
> If I ran it by hand (interactively), it seemed to work ok.  Any  
> ideas as
> what I can to make these login scripts setup the environment run
> seamlessly?
>
> [sam at prodnode1 fdtd_0.3]$ `which mpirun` --host prodnode2,prodnode3  
> -np
> 16 --prefix /usr/local/profiles/gcc-openmpi/
> /home/sam/code/fdtd/fdtd_0.3/fdtd -t
> /home/sam/code/fdtd/fdtd_0.3/test_files/tissue.txt -r
> /home/sam/code/fdtd/fdtd_0.3/test_files/ 
> sphere_brain_10_pad_x0120y0120z0
> 120.raw -v -f 500 --pw 90,0,1,0 -l test_log.out -a 10 --prefix job_8
> Beowulf Computer Cluster (BCC)
> AFRL/HED
>
> This is a Department of Defense Computer System. This computer system,
> includingall related equipment, networks, and network devices
> (specifically including
> Internet access) are provided only for authorized U.S. Government use.
>
> DoD computer systems may be monitored for all lawful purposes,  
> including
> to
> ensure that their use is authorized, for management of the system, to
> facilitateprotection against unauthorized access, and to verify  
> security
> procedures,
> survivability, and operational security. Monitoring includes active
> attacks by
> authorized DoD entities to test or verify the security of this system.
> During
> monitoring, information may be examined, recorded, copied and used for
> authorized purposes. All information, including personal information,
> placed or sent over this system may be monitored.
> Beowulf Computer Cluster (BCC)
> AFRL/HED
>
> This is a Department of Defense Computer System. This computer system,
> includingall related equipment, networks, and network devices
> (specifically including
> Internet access) are provided only for authorized U.S. Government use.
>
> DoD computer systems may be monitored for all lawful purposes,  
> including
> to
> ensure that their use is authorized, for management of the system, to
> facilitateprotection against unauthorized access, and to verify  
> security
> procedures,
> survivability, and operational security. Monitoring includes active
> attacks by
> authorized DoD entities to test or verify the security of this system.
> During
> monitoring, information may be examined, recorded, copied and used for
> authorized purposes. All information, including personal information,
> placed or sent over this system may be monitored.
>
>  * Initializing FDTD            [ OK ]
>  * Allocating memory            [ OK ]
>  * Initializing PML             [ OK ]
>  * Starting updates
>  * halfcycle 1   ratio 0.0000   time 52.72s
>  * halfcycle 2   ratio 5.4387   time 51.87s
> ...
>
> Sam Adams
> General Dynamics Information Technology
> Phone: 210.536.5945
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070910/6c1b5c24/attachment-0001.html


More information about the torqueusers mailing list