[torqueusers] torque not listening to ppn request specs

Gustavo Correa gus at ldeo.columbia.edu
Thu Oct 27 13:30:23 MDT 2011


Hi Steven

On Oct 27, 2011, at 12:53 PM, DuChene, StevenX A wrote:

> Thanks to all how are reading and responding to my pleas for assistance or guidance.
> 
> We are a benchmarking center and I have a user who wants to start up his benchmark process across all 256 nodes, one process per node.

The OpenMPI mpirun/mpiexec command has the options '-bynode/-pernode/-npernode' 
which should do this, as long as you request all 256 nodes from  Torque 
[with #PBS -l nodes=256:ppn=8 assuming you have 8 cores per node]
See 'man mpiexec' for more details.


> Yes, right now I am using openmpi but later today I need to try all of this with the Intel MPI implementation.
> 
> I tried doing the following:
> 
> $(PBS_NODEFILE) > /home/myuser/mpi_test/cruddy256
> mpirun --machinefile $PBS_NODEFILE /home/myuser/mpi_test/mpi_hello_hostname
> 

Is it a typo or did you miss the 'cat' command, as in 'cat $PBS_NODEFILE > ...'?

BTW, if you build OpenMPI from source/tarball with Torque support 
[configure --prefix=/your/favorite/location/to/install --with-tm=/path/to/libtorque] then
mpiexec will use $PBS_NODEFILE automatically as its machinefile,
no need to create it by hand.

I hope this helps,
Gus Correa

> so I could try examining the nodefile I am getting from torque but all I get is a zero length file.
> 
> I looked in my torque accounting logs and I see things in the execution host list of:
> 
> exec_host=eatom255/3+eatom255/2+eatom255/1+eatom255/0+eatom254/3+eatom254/2+eatom254/1+eatom254/0+eatom253/3+eatom253/2+eatom253/1+eatom253/0
> 
> I copied this exec_host= stuff to a separate file and did some text munging and I only see 64 unique hosts being allocated by torque.
> 
> So does that mean torque is screwing me over or could it still be some optimization being done by maui that is running as the scheduler above the torque pbs_server process?
> --
> Steven DuChene
> 
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Coyle, James J [ITACD]
> Sent: Thursday, October 27, 2011 9:19 AM
> To: Torque Users Mailing List
> Subject: Re: [torqueusers] torque not listening to ppn request specs
> 
> Steve,
> 
>  If this is a question just of design and not of use, 
> ignore the following:
> 
> 
> 
> Getting what you want, 1 processor on N nodes.
> 
> Possibilities:
> 1)  One possibility is to try:
> 
> qmgr -c 'set server node_pack = False'
> 
> (I think that the default setup is True, which is
> what I want and use, this keeps nodes more free.)
> I don't know if that will give you the behavior that 
> you want, but it does try to launch jobs on separate
> nodes.
> 
> 2) Use nodes=20:ppn=4 and use --bynode option if you are using 
> OpenMPI (which is what I advise users here) or 
> if you are using another implementation  of MPI that does 
> not support --bynode or something similar, issue
> 
> uniq < ${PBS_NODEFILE} > Nodefile
> mpirun -np 20 -machinefile Nodefile  ./application
> 
> (I actually supply a script mpirun1, which does this along
> with mpirun2, mpirun3, that supply 2, 3, etc per node for 
> two clusters that use vendor MPI's based upon MPICH.)
> 
> best of Luck,
> James Coyle, PhD
> High Performance Computing Group        
> Iowa State Univ.          
> web: http://jjc.public.iastate.edu/
> 
>> -----Original Message-----
>> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
>> bounces at supercluster.org] On Behalf Of DuChene, StevenX A
>> Sent: Thursday, October 27, 2011 10:48 AM
>> To: Torque Users Mailing List
>> Subject: Re: [torqueusers] torque not listening to ppn request specs
>> 
>> Is it possible that there is some maui setting that could have an
>> effect on packing processes on nodes (one per processor) rather than
>> spreading them out across nodes (one per node)? Some "optimization"
>> thing I need to turn off or on?
>> --
>> Steven DuChene
>> 
>> -----Original Message-----
>> From: DuChene, StevenX A
>> Sent: Thursday, October 27, 2011 8:32 AM
>> To: Torque Users Mailing List
>> Subject: RE: [torqueusers] torque not listening to ppn request specs
>> 
>> Ken:
>> I tried that and my output file still shows that there are only 64
>> unique hosts being used four times each instead of 256 hosts used 1
>> time each. So as I said I am not getting the results out of the
>> ppn=1 directive that I am expecting.
>> --
>> Steven DuChene
>> 
>> -----Original Message-----
>> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
>> bounces at supercluster.org] On Behalf Of Ken Nielson
>> Sent: Wednesday, October 26, 2011 10:07 AM
>> To: Torque Users Mailing List
>> Subject: Re: [torqueusers] torque not listening to ppn request specs
>> 
>> 
>> 
>> ----- Original Message -----
>>> From: "StevenX A DuChene" <stevenx.a.duchene at intel.com>
>>> To: torqueusers at supercluster.org
>>> Sent: Tuesday, October 25, 2011 6:10:13 PM
>>> Subject: [torqueusers] torque not listening to ppn request specs
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Hello all:
>>> 
>>> I have torque 2.5.7 and maui 3.2.6p21 installed on a couple of
>> small
>>> clusters and I am submitting the following mpi job using:
>>> 
>>> 
>>> 
>>> qsub -l nodes=12:mynode:ppn=1 script_noarch.pbs
>>> 
>>> 
>>> 
>>> this script is very simple as it only has one line in it to invoke
>>> the call to mpirun
>>> 
>>> 
>>> 
>>> mpirun --machinefile $PBS_NODEFILE
>>> /home/myuser/mpi_test/mpi_hello_hostname
>>> 
>>> 
>>> 
>>> The actual source to this is also very simple:
>>> 
>>> 
>>> 
>>> #include <mpi.h>
>>> 
>>> #include <stdio.h>
>>> 
>>> 
>>> 
>>> int main(int argc, char **argv)
>>> 
>>> {
>>> 
>>> int *buf, i, rank, nints, len;
>>> 
>>> char hostname[256];
>>> 
>>> 
>>> 
>>> MPI_Init(&argc,&argv);
>>> 
>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>> 
>>> gethostname(hostname,255);
>>> 
>>> printf("Hello world! I am process number: %d on host %s\n", rank,
>>> hostname);
>>> 
>>> MPI_Finalize();
>>> 
>>> return 0;
>>> 
>>> }
>>> 
>>> 
>>> 
>>> When I run this with the ppn=1 specification I would expect one
>>> processer per node spread over twelve nodes but when I look at my
>>> output file I see it is running multiple processes per node
>> instead.
>>> So as a result I do not see the output from twelve unique nodes as
>> I
>>> would expect.
>>> 
>>> 
>>> 
>>> My nodes file has the following sorts of entries:
>>> 
>>> 
>>> 
>>> enode01 np=4 mynode
>>> 
>>> enode02 np=4 mynode
>>> 
>>> enode03 np=4 mynode
>>> 
>>> enode04 np=4 mynode
>>> 
>>> enode05 np=4 mynode
>>> 
>>> enode06 np=4 mynode
>>> 
>>> enode07 np=4 mynode
>>> 
>>> enode08 np=4 mynode
>>> 
>>> enode09 np=4 mynode
>>> 
>>> enode10 np=4 mynode
>>> 
>>> enode11 np=4 mynode
>>> 
>>> enode12 np=4 mynode
>>> 
>>> 
>>> 
>>> I know I can remove the np=4 from each node specification and get
>> the
>>> one process per node but I was under the impression that I could
>> use
>>> the ppn=1 or whatever to get the same thing.
>>> 
>>> 
>>> 
>>> Am I misunderstanding or overlooking something?
>>> 
>>> --
>>> 
>> 
>> 
>> Steven,
>> 
>> Try qsub -l nodes=12:ppn=1:mynode script_noarch.pbs
>> 
>> Ken
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list