[torqueusers] problem with shared libraries
Jan
jand at uvic.ca
Sun Feb 10 16:30:06 MST 2008
Hi,
I am in the (slow) process of setting up my first cluster. So far, I
have 2 machines with 8 cpus each (running ubuntu 7.10). One machine is a
server and a node (node1) at once, the other one is a node (node2).
pbsnodes -a reports both nodes as working. node1 /home is mounted via
nfs onto node2 . When I look over the log files in
/var/spool/torque/*_logs/ I cannot find anything obviously wrong.
I compiled pbs, and installed it. I configured everything (setting the
server name etc. on both machines) following the online documentation.
Now I seem to have two problems:
1) if I submit a script such as:
#PBS -l nodes=1:ppn=8
#PBS -l walltime=96:00:00
#PBS -j oe
# change the current working directory to the directory where
# the executable file 'hello' can be found
cd $PBS_O_WORKDIR
echo $PBS_O_WORKDIR
# run the executable file 'hello' using the qmpirun script
/usr/local/bin/mpirun -np 8 --prefix /usr/local ./fgs > ./test.log
everything works. The code runs on 8 CPUs and I get the expected results
from my code.
If I omit the "-np 8" the code only runs on one cpu. I did not expect
that behaviour since I specified ppn=8 above.
Any suggestions as to why ppn=8 does not work?
2) if I submit
#PBS -l nodes=2:ppn=1
#PBS -l walltime=96:00:00
#PBS -j oe
# change the current working directory to the directory where
# the executable file 'hello' can be found
cd $PBS_O_WORKDIR
echo $PBS_O_WORKDIR
# run the executable file 'hello' using the qmpirun script
/usr/local/bin/mpirun -np 8 --prefix /usr/local ./fgs > ./test.log
qstat indicates that the job is running but the code is not being
executed. If I qdel the job, the error file indicates that
a shared lib is missing:
fgs: error while loading shared libraries: libimf.so: cannot open shared
object file: No such file or directory
I assume that this happens on node2. However, if I log into the node and
execute the job directly with mpirun, it runs as expected.
Any help is greatly appreciated,
Jan
--
Jan Dettmer, Postdoctoral Fellow
School of Earth and Ocean Sciences, University of Victoria
Victoria, BC V8W 3P6
office: (250) 472-4342 email: jand at uvic.ca
http://web.uvic.ca/~jand/
More information about the torqueusers
mailing list