[torqueusers] Torque with OpenMPI

Joseph Han jhh3851 at yahoo.com
Thu Feb 21 10:39:11 MST 2008


Jozef,

> 
> Message: 4
> Date: Thu, 21 Feb 2008 18:09:42 +0100
> From: " Jozef K??er " <quickparser at gmail.com>
> Subject: Re: [torqueusers] Torque with OpenMPI
> To: "Craig West" <cwest at astro.umass.edu>
> Cc: torqueusers at supercluster.org
> Message-ID:
> 	<8803b3d0802210909m4a84fa83s65f016eb9780d15d at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> This was great Craig! I would never tell that the code might be buggy.
> I copied the recompiled binary to all my nodes (I don't still have NFS).
> Now, when I run the code like this:
> 
> q-parser at f135-3:~$ mpirun -np 7 --hostfile zoznam test_app
> 0(f135-4): We have 7 processors
> 0(f135-4): Hello 1! Processor 1 (f135-5) reporting for duty
> 
> 0(f135-4): Hello 2! Processor 2 (f135-6) reporting for duty
> 
> 0(f135-4): Hello 3! Processor 3 (f135-7) reporting for duty
> 
> 0(f135-4): Hello 4! Processor 4 (f135-8) reporting for duty
> 
> 0(f135-4): Hello 5! Processor 5 (f135-9) reporting for duty
> 
> 0(f135-4): Hello 6! Processor 6 (f135-11) reporting for duty
> 
> It seems to me that one processor is still lost, but I have no bug info with
> this.
> However, when I run it using torque, the job seems to be hung. 'showq' shows
> that the job is running but never finishes.

You are not missing one processor.  Instead, MPI_Ranks are counted starting
with 0.  You code example is one where Rank 0 is a "master" type process and
only ranks 1 to n-1 reply with work.  The structure can be seen below with the
"if" test of "myid == 0".


> 
> #include <mpi.h>
...SNIP...
> 
>    MPI_Init(&argc,&argv); /* all MPI programs start with MPI_Init; all 'N'
> processes exist thereafter */
>    MPI_Comm_size(MPI_COMM_WORLD,&numprocs); /* find out how big the SPMD
> world is */
>    MPI_Comm_rank(MPI_COMM_WORLD,&myid); /* and this processes' rank is */
>    MPI_Get_processor_name(processor_name, &namelen);
> 
>    /* At this point, all the programs are running equivalently, the rank is
> used to
>       distinguish the roles of the programs in the SPMD model, with rank 0
> often used
>       specially... */
>    if(myid == 0)
>    {
>      printf("%d(%s): We have %d processors\n", myid, processor_name,
> numprocs);
>      for(i=1;i<numprocs;i++)
>      {

...SNIP...


Joseph



More information about the torqueusers mailing list