[torqueusers] torque not listening to ppn request specs

DuChene, StevenX A stevenx.a.duchene at intel.com
Thu Oct 27 10:53:33 MDT 2011


Thanks to all how are reading and responding to my pleas for assistance or guidance.

We are a benchmarking center and I have a user who wants to start up his benchmark process across all 256 nodes, one process per node. Yes, right now I am using openmpi but later today I need to try all of this with the Intel MPI implementation.

I tried doing the following:

$(PBS_NODEFILE) > /home/myuser/mpi_test/cruddy256
mpirun --machinefile $PBS_NODEFILE /home/myuser/mpi_test/mpi_hello_hostname

so I could try examining the nodefile I am getting from torque but all I get is a zero length file.

I looked in my torque accounting logs and I see things in the execution host list of:

exec_host=eatom255/3+eatom255/2+eatom255/1+eatom255/0+eatom254/3+eatom254/2+eatom254/1+eatom254/0+eatom253/3+eatom253/2+eatom253/1+eatom253/0

I copied this exec_host= stuff to a separate file and did some text munging and I only see 64 unique hosts being allocated by torque.

So does that mean torque is screwing me over or could it still be some optimization being done by maui that is running as the scheduler above the torque pbs_server process?
--
Steven DuChene

-----Original Message-----
From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Coyle, James J [ITACD]
Sent: Thursday, October 27, 2011 9:19 AM
To: Torque Users Mailing List
Subject: Re: [torqueusers] torque not listening to ppn request specs

Steve,

  If this is a question just of design and not of use, 
ignore the following:



Getting what you want, 1 processor on N nodes.

Possibilities:
1)  One possibility is to try:

qmgr -c 'set server node_pack = False'

(I think that the default setup is True, which is
what I want and use, this keeps nodes more free.)
I don't know if that will give you the behavior that 
you want, but it does try to launch jobs on separate
nodes.

2) Use nodes=20:ppn=4 and use --bynode option if you are using 
OpenMPI (which is what I advise users here) or 
if you are using another implementation  of MPI that does 
not support --bynode or something similar, issue

uniq < ${PBS_NODEFILE} > Nodefile
mpirun -np 20 -machinefile Nodefile  ./application

(I actually supply a script mpirun1, which does this along
with mpirun2, mpirun3, that supply 2, 3, etc per node for 
two clusters that use vendor MPI's based upon MPICH.)

best of Luck,
James Coyle, PhD
 High Performance Computing Group        
 Iowa State Univ.          
web: http://jjc.public.iastate.edu/

>-----Original Message-----
>From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
>bounces at supercluster.org] On Behalf Of DuChene, StevenX A
>Sent: Thursday, October 27, 2011 10:48 AM
>To: Torque Users Mailing List
>Subject: Re: [torqueusers] torque not listening to ppn request specs
>
>Is it possible that there is some maui setting that could have an
>effect on packing processes on nodes (one per processor) rather than
>spreading them out across nodes (one per node)? Some "optimization"
>thing I need to turn off or on?
>--
>Steven DuChene
>
>-----Original Message-----
>From: DuChene, StevenX A
>Sent: Thursday, October 27, 2011 8:32 AM
>To: Torque Users Mailing List
>Subject: RE: [torqueusers] torque not listening to ppn request specs
>
>Ken:
>I tried that and my output file still shows that there are only 64
>unique hosts being used four times each instead of 256 hosts used 1
>time each. So as I said I am not getting the results out of the
>ppn=1 directive that I am expecting.
>--
>Steven DuChene
>
>-----Original Message-----
>From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
>bounces at supercluster.org] On Behalf Of Ken Nielson
>Sent: Wednesday, October 26, 2011 10:07 AM
>To: Torque Users Mailing List
>Subject: Re: [torqueusers] torque not listening to ppn request specs
>
>
>
>----- Original Message -----
>> From: "StevenX A DuChene" <stevenx.a.duchene at intel.com>
>> To: torqueusers at supercluster.org
>> Sent: Tuesday, October 25, 2011 6:10:13 PM
>> Subject: [torqueusers] torque not listening to ppn request specs
>>
>>
>>
>>
>>
>> Hello all:
>>
>> I have torque 2.5.7 and maui 3.2.6p21 installed on a couple of
>small
>> clusters and I am submitting the following mpi job using:
>>
>>
>>
>> qsub -l nodes=12:mynode:ppn=1 script_noarch.pbs
>>
>>
>>
>> this script is very simple as it only has one line in it to invoke
>> the call to mpirun
>>
>>
>>
>> mpirun --machinefile $PBS_NODEFILE
>> /home/myuser/mpi_test/mpi_hello_hostname
>>
>>
>>
>> The actual source to this is also very simple:
>>
>>
>>
>> #include <mpi.h>
>>
>> #include <stdio.h>
>>
>>
>>
>> int main(int argc, char **argv)
>>
>> {
>>
>> int *buf, i, rank, nints, len;
>>
>> char hostname[256];
>>
>>
>>
>> MPI_Init(&argc,&argv);
>>
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>
>> gethostname(hostname,255);
>>
>> printf("Hello world! I am process number: %d on host %s\n", rank,
>> hostname);
>>
>> MPI_Finalize();
>>
>> return 0;
>>
>> }
>>
>>
>>
>> When I run this with the ppn=1 specification I would expect one
>> processer per node spread over twelve nodes but when I look at my
>> output file I see it is running multiple processes per node
>instead.
>> So as a result I do not see the output from twelve unique nodes as
>I
>> would expect.
>>
>>
>>
>> My nodes file has the following sorts of entries:
>>
>>
>>
>> enode01 np=4 mynode
>>
>> enode02 np=4 mynode
>>
>> enode03 np=4 mynode
>>
>> enode04 np=4 mynode
>>
>> enode05 np=4 mynode
>>
>> enode06 np=4 mynode
>>
>> enode07 np=4 mynode
>>
>> enode08 np=4 mynode
>>
>> enode09 np=4 mynode
>>
>> enode10 np=4 mynode
>>
>> enode11 np=4 mynode
>>
>> enode12 np=4 mynode
>>
>>
>>
>> I know I can remove the np=4 from each node specification and get
>the
>> one process per node but I was under the impression that I could
>use
>> the ppn=1 or whatever to get the same thing.
>>
>>
>>
>> Am I misunderstanding or overlooking something?
>>
>> --
>>
>
>
>Steven,
>
>Try qsub -l nodes=12:ppn=1:mynode script_noarch.pbs
>
>Ken
>_______________________________________________
>torqueusers mailing list
>torqueusers at supercluster.org
>http://www.supercluster.org/mailman/listinfo/torqueusers
>_______________________________________________
>torqueusers mailing list
>torqueusers at supercluster.org
>http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list