[torqueusers] Program hangs when accessing MPI file
Pablo Guaza Peces
pabloguaza at ugr.es
Mon Sep 17 02:54:55 MDT 2012
I've been having this problem for a while now and I haven't been able to
Whenever I access a MPI shared file my program freezes and it doesn't
give any output or errors. I made this very very simple program in C to
int main(int argc, char **argv)
MPI_MODE_CREATE | MPI_MODE_RDWR,
As I said it freezes and I have to kill it myself with qdel command. It
actually creates the file "datafile", and there's no output in the error
or output files besides the ones related to being manually killed.
I send this program to torque with this PBS script:
#PBS -S /bin/bash
#PBS -A batch
#PBS -N test_mpi_file
#PBS -l nodes=2:ppn=2
#PBS -l walltime=00:02:50
#PBS -j oe
mpiexec.hydra -rmk pbs /home/pablo/Programs/mbg/c/test_mpi_file
I have the next SW configuration:
- mpich2 1.2.1 using Hydra
- Torque 2.5.7
- Maui 3.2.6
Maybe it has something to do with the NFS home directory that is shared
with all the nodes, because I can execute the program with no problem
when I do it in just one machine, being the head node or any other. It
only fails when two or more machines are accessing the file.
Is there any way I could try to debug the program when it's being
executed in at least two nodes?
Any help would be very appreciated!
More information about the torqueusers