[torqueusers] mpi libraries not being loaded with torque

Brock Palen brockp at umich.edu
Mon Sep 10 17:20:34 MDT 2007


I highly recommend modules:
http://modules.sourceforge.net/

We use it to manage a very large list of software versions:
http://cac.engin.umich.edu/resources/software/

Brock Palen
Center for Advanced Computing
brockp at umich.edu
(734)936-1985


On Sep 10, 2007, at 5:33 PM, Donald Tripp wrote:

> In my experience, its best to put these kinds of configuration  
> changes in the /etc/profile.d folder, so that it gets loaded  
> globally. If your users need individual small changes, then  
> the .bashrc file should suffice.
>
>
> - Donald Tripp
>  dtripp at hawaii.edu
> ----------------------------------------------
> HPC Systems Administrator
> High Performance Computing Center
> University of Hawai'i at Hilo
> 200 W. Kawili Street
> Hilo,   Hawaii   96720
> http://www.hpc.uhh.hawaii.edu
>
>
> On Sep 10, 2007, at 11:25 AM, Adams, Samuel D Contr AFRL/HEDR wrote:
>
>> I am trying to make my new cluster flexible such that it can run with
>> more than one configuration at the same time.  For example, you can
>> choose gcc, pg, or Intel compilers using OpenMPI.  To start out  
>> with, I
>> am just using gcc 4.1 that comes with RHEL5 and OpenMPI.  For some
>> reason, I am having trouble with the way it is loading the libraries
>> depending on how I run the job.  Basically it would seem that the
>> LD_LIBRARY_PATH is not set properly depending one how I run the  
>> job; it
>> works interactively but not with torque.
>>
>> I have this set in my .bashrc file in the root of my home directory
>>
>> if [ `hostname | grep "prod"` ]; then
>>         PATH=/usr/local/profiles/gcc-openmpi/bin/:$PATH
>>
>> LD_LIBRARY_PATH=/usr/local/profiles/gcc-openmpi/lib/:$LD_LIBRARY_PATH
>> fi
>>
>> So, theroretically this should set the PATH and LD_LIBRARY_PATH  
>> properly
>> whenever I open a shell.
>>
>> First I tired to submit a job with torque with a script something  
>> like
>> this:
>>
>> !/bin/bash
>> #PBS -l nodes=2:ppn=8
>>
>> `which mpirun` --prefix /usr/local/profiles/gcc-openmpi/  
>> program_to_run
>> exit 0
>>
>> As you can see, I tried everything I could think of to get around  
>> it not
>> finding the libraries, but it was to no avail.  This is the error I
>> invariably get:
>>
>> [sam at prodnode1 fdtd_0.3]$ cat script.sh.e223
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>> /home/sam/code/fdtd/fdtd_0.3/fdtd: error while loading shared  
>> libraries:
>> libmpi.so.0: cannot open shared object file: No such file or  
>> directory
>>
>> As a test, I ran:
>>
>> [sam at prodnode1 fdtd_0.3]$ echo "echo $LD_LIBRARY_PATH" | qsub
>>
>> And I got, which seems to be what I would expect:
>>
>> [sam at prodnode1 fdtd_0.3]$ cat STDIN.o226
>> /usr/local/profiles/gcc-openmpi/lib/:
>>
>>
>> If I ran it by hand (interactively), it seemed to work ok.  Any  
>> ideas as
>> what I can to make these login scripts setup the environment run
>> seamlessly?
>>
>> [sam at prodnode1 fdtd_0.3]$ `which mpirun` --host  
>> prodnode2,prodnode3 -np
>> 16 --prefix /usr/local/profiles/gcc-openmpi/
>> /home/sam/code/fdtd/fdtd_0.3/fdtd -t
>> /home/sam/code/fdtd/fdtd_0.3/test_files/tissue.txt -r
>> /home/sam/code/fdtd/fdtd_0.3/test_files/ 
>> sphere_brain_10_pad_x0120y0120z0
>> 120.raw -v -f 500 --pw 90,0,1,0 -l test_log.out -a 10 --prefix job_8
>> Beowulf Computer Cluster (BCC)
>> AFRL/HED
>>
>> This is a Department of Defense Computer System. This computer  
>> system,
>> includingall related equipment, networks, and network devices
>> (specifically including
>> Internet access) are provided only for authorized U.S. Government  
>> use.
>>
>> DoD computer systems may be monitored for all lawful purposes,  
>> including
>> to
>> ensure that their use is authorized, for management of the system, to
>> facilitateprotection against unauthorized access, and to verify  
>> security
>> procedures,
>> survivability, and operational security. Monitoring includes active
>> attacks by
>> authorized DoD entities to test or verify the security of this  
>> system.
>> During
>> monitoring, information may be examined, recorded, copied and used  
>> for
>> authorized purposes. All information, including personal information,
>> placed or sent over this system may be monitored.
>> Beowulf Computer Cluster (BCC)
>> AFRL/HED
>>
>> This is a Department of Defense Computer System. This computer  
>> system,
>> includingall related equipment, networks, and network devices
>> (specifically including
>> Internet access) are provided only for authorized U.S. Government  
>> use.
>>
>> DoD computer systems may be monitored for all lawful purposes,  
>> including
>> to
>> ensure that their use is authorized, for management of the system, to
>> facilitateprotection against unauthorized access, and to verify  
>> security
>> procedures,
>> survivability, and operational security. Monitoring includes active
>> attacks by
>> authorized DoD entities to test or verify the security of this  
>> system.
>> During
>> monitoring, information may be examined, recorded, copied and used  
>> for
>> authorized purposes. All information, including personal information,
>> placed or sent over this system may be monitored.
>>
>>  * Initializing FDTD            [ OK ]
>>  * Allocating memory            [ OK ]
>>  * Initializing PML             [ OK ]
>>  * Starting updates
>>  * halfcycle 1   ratio 0.0000   time 52.72s
>>  * halfcycle 2   ratio 5.4387   time 51.87s
>> ...
>>
>> Sam Adams
>> General Dynamics Information Technology
>> Phone: 210.536.5945
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070910/9c5e98e3/attachment-0001.html


More information about the torqueusers mailing list