[torqueusers] problem with shared libraries

Jan jand at uvic.ca
Sun Feb 10 16:30:06 MST 2008


Hi,

  I am in the (slow) process of setting up my first cluster. So far, I 
have 2 machines with 8 cpus each (running ubuntu 7.10). One machine is a 
server and a node (node1) at once, the other one is a node (node2). 
pbsnodes -a reports both nodes as working. node1 /home is mounted via 
nfs onto node2 . When I look over the log files in 
/var/spool/torque/*_logs/ I cannot find anything obviously wrong.

I compiled pbs, and installed it. I configured everything (setting the 
server name etc. on both machines) following the online documentation.

Now I seem to have two problems:
1) if I submit a script such as:
#PBS -l nodes=1:ppn=8
#PBS -l walltime=96:00:00
#PBS -j oe

# change the current working directory to the directory where
# the executable file 'hello' can be found
cd $PBS_O_WORKDIR
echo $PBS_O_WORKDIR

# run the executable file 'hello' using the qmpirun script
/usr/local/bin/mpirun -np 8 --prefix /usr/local ./fgs > ./test.log

everything works. The code runs on 8 CPUs and I get the expected results 
from my code.

If I omit the "-np 8" the code only runs on one cpu. I did not expect 
that behaviour  since I specified ppn=8 above.
Any suggestions as to why ppn=8 does not work?

2) if I submit
#PBS -l nodes=2:ppn=1
#PBS -l walltime=96:00:00
#PBS -j oe

# change the current working directory to the directory where
# the executable file 'hello' can be found
cd $PBS_O_WORKDIR
echo $PBS_O_WORKDIR

# run the executable file 'hello' using the qmpirun script
/usr/local/bin/mpirun -np 8 --prefix /usr/local ./fgs > ./test.log

qstat indicates that the job is running but the code is not being 
executed. If I qdel the job, the error file indicates that
a shared lib is missing:
fgs: error while loading shared libraries: libimf.so: cannot open shared 
object file: No such file or directory

I assume that this happens on node2. However, if I log into the node and 
execute the job directly with mpirun, it runs as expected.

Any help is greatly appreciated,
Jan


-- 
Jan Dettmer, Postdoctoral Fellow
School of Earth and Ocean Sciences, University of Victoria	
Victoria, BC V8W 3P6
office: (250) 472-4342	email: jand at uvic.ca
http://web.uvic.ca/~jand/


More information about the torqueusers mailing list