[torqueusers] mpi libraries not being loaded with torque
Donald Tripp
dtripp at hawaii.edu
Mon Sep 10 15:33:13 MDT 2007
In my experience, its best to put these kinds of configuration
changes in the /etc/profile.d folder, so that it gets loaded
globally. If your users need individual small changes, then
the .bashrc file should suffice.
- Donald Tripp
dtripp at hawaii.edu
----------------------------------------------
HPC Systems Administrator
High Performance Computing Center
University of Hawai'i at Hilo
200 W. Kawili Street
Hilo, Hawaii 96720
http://www.hpc.uhh.hawaii.edu
On Sep 10, 2007, at 11:25 AM, Adams, Samuel D Contr AFRL/HEDR wrote:
> I am trying to make my new cluster flexible such that it can run with
> more than one configuration at the same time. For example, you can
> choose gcc, pg, or Intel compilers using OpenMPI. To start out
> with, I
> am just using gcc 4.1 that comes with RHEL5 and OpenMPI. For some
> reason, I am having trouble with the way it is loading the libraries
> depending on how I run the job. Basically it would seem that the
> LD_LIBRARY_PATH is not set properly depending one how I run the
> job; it
> works interactively but not with torque.
>
> I have this set in my .bashrc file in the root of my home directory
>
> if [ `hostname | grep "prod"` ]; then
> PATH=/usr/local/profiles/gcc-openmpi/bin/:$PATH
>
> LD_LIBRARY_PATH=/usr/local/profiles/gcc-openmpi/lib/:$LD_LIBRARY_PATH
> fi
>
> So, theroretically this should set the PATH and LD_LIBRARY_PATH
> properly
> whenever I open a shell.
>
> First I tired to submit a job with torque with a script something like
> this:
>
> !/bin/bash
> #PBS -l nodes=2:ppn=8
>
> `which mpirun` --prefix /usr/local/profiles/gcc-openmpi/
> program_to_run
> exit 0
>
> As you can see, I tried everything I could think of to get around
> it not
> finding the libraries, but it was to no avail. This is the error I
> invariably get:
>
> [sam at prodnode1 fdtd_0.3]$ cat script.sh.e223
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared
> libraries:
> libmpi.so.0: cannot open shared object file: No such file or directory
>
> As a test, I ran:
>
> [sam at prodnode1 fdtd_0.3]$ echo "echo $LD_LIBRARY_PATH" | qsub
>
> And I got, which seems to be what I would expect:
>
> [sam at prodnode1 fdtd_0.3]$ cat STDIN.o226
> /usr/local/profiles/gcc-openmpi/lib/:
>
>
> If I ran it by hand (interactively), it seemed to work ok. Any
> ideas as
> what I can to make these login scripts setup the environment run
> seamlessly?
>
> [sam at prodnode1 fdtd_0.3]$ `which mpirun` --host prodnode2,prodnode3
> -np
> 16 --prefix /usr/local/profiles/gcc-openmpi/
> /home/sam/code/fdtd/fdtd_0.3/fdtd -t
> /home/sam/code/fdtd/fdtd_0.3/test_files/tissue.txt -r
> /home/sam/code/fdtd/fdtd_0.3/test_files/
> sphere_brain_10_pad_x0120y0120z0
> 120.raw -v -f 500 --pw 90,0,1,0 -l test_log.out -a 10 --prefix job_8
> Beowulf Computer Cluster (BCC)
> AFRL/HED
>
> This is a Department of Defense Computer System. This computer system,
> includingall related equipment, networks, and network devices
> (specifically including
> Internet access) are provided only for authorized U.S. Government use.
>
> DoD computer systems may be monitored for all lawful purposes,
> including
> to
> ensure that their use is authorized, for management of the system, to
> facilitateprotection against unauthorized access, and to verify
> security
> procedures,
> survivability, and operational security. Monitoring includes active
> attacks by
> authorized DoD entities to test or verify the security of this system.
> During
> monitoring, information may be examined, recorded, copied and used for
> authorized purposes. All information, including personal information,
> placed or sent over this system may be monitored.
> Beowulf Computer Cluster (BCC)
> AFRL/HED
>
> This is a Department of Defense Computer System. This computer system,
> includingall related equipment, networks, and network devices
> (specifically including
> Internet access) are provided only for authorized U.S. Government use.
>
> DoD computer systems may be monitored for all lawful purposes,
> including
> to
> ensure that their use is authorized, for management of the system, to
> facilitateprotection against unauthorized access, and to verify
> security
> procedures,
> survivability, and operational security. Monitoring includes active
> attacks by
> authorized DoD entities to test or verify the security of this system.
> During
> monitoring, information may be examined, recorded, copied and used for
> authorized purposes. All information, including personal information,
> placed or sent over this system may be monitored.
>
> * Initializing FDTD [ OK ]
> * Allocating memory [ OK ]
> * Initializing PML [ OK ]
> * Starting updates
> * halfcycle 1 ratio 0.0000 time 52.72s
> * halfcycle 2 ratio 5.4387 time 51.87s
> ...
>
> Sam Adams
> General Dynamics Information Technology
> Phone: 210.536.5945
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070910/6c1b5c24/attachment-0001.html
More information about the torqueusers
mailing list