[torqueusers] Torque with OpenMPI
jhh3851 at yahoo.com
Thu Feb 21 10:39:11 MST 2008
> Message: 4
> Date: Thu, 21 Feb 2008 18:09:42 +0100
> From: " Jozef K??er " <quickparser at gmail.com>
> Subject: Re: [torqueusers] Torque with OpenMPI
> To: "Craig West" <cwest at astro.umass.edu>
> Cc: torqueusers at supercluster.org
> <8803b3d0802210909m4a84fa83s65f016eb9780d15d at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
> This was great Craig! I would never tell that the code might be buggy.
> I copied the recompiled binary to all my nodes (I don't still have NFS).
> Now, when I run the code like this:
> q-parser at f135-3:~$ mpirun -np 7 --hostfile zoznam test_app
> 0(f135-4): We have 7 processors
> 0(f135-4): Hello 1! Processor 1 (f135-5) reporting for duty
> 0(f135-4): Hello 2! Processor 2 (f135-6) reporting for duty
> 0(f135-4): Hello 3! Processor 3 (f135-7) reporting for duty
> 0(f135-4): Hello 4! Processor 4 (f135-8) reporting for duty
> 0(f135-4): Hello 5! Processor 5 (f135-9) reporting for duty
> 0(f135-4): Hello 6! Processor 6 (f135-11) reporting for duty
> It seems to me that one processor is still lost, but I have no bug info with
> However, when I run it using torque, the job seems to be hung. 'showq' shows
> that the job is running but never finishes.
You are not missing one processor. Instead, MPI_Ranks are counted starting
with 0. You code example is one where Rank 0 is a "master" type process and
only ranks 1 to n-1 reply with work. The structure can be seen below with the
"if" test of "myid == 0".
> #include <mpi.h>
> MPI_Init(&argc,&argv); /* all MPI programs start with MPI_Init; all 'N'
> processes exist thereafter */
> MPI_Comm_size(MPI_COMM_WORLD,&numprocs); /* find out how big the SPMD
> world is */
> MPI_Comm_rank(MPI_COMM_WORLD,&myid); /* and this processes' rank is */
> MPI_Get_processor_name(processor_name, &namelen);
> /* At this point, all the programs are running equivalently, the rank is
> used to
> distinguish the roles of the programs in the SPMD model, with rank 0
> often used
> specially... */
> if(myid == 0)
> printf("%d(%s): We have %d processors\n", myid, processor_name,
More information about the torqueusers