[torqueusers] torque not listening to ppn request specs
Coyle, James J [ITACD]
jjc at iastate.edu
Thu Oct 27 10:18:51 MDT 2011
Steve,
If this is a question just of design and not of use,
ignore the following:
Getting what you want, 1 processor on N nodes.
Possibilities:
1) One possibility is to try:
qmgr -c 'set server node_pack = False'
(I think that the default setup is True, which is
what I want and use, this keeps nodes more free.)
I don't know if that will give you the behavior that
you want, but it does try to launch jobs on separate
nodes.
2) Use nodes=20:ppn=4 and use --bynode option if you are using
OpenMPI (which is what I advise users here) or
if you are using another implementation of MPI that does
not support --bynode or something similar, issue
uniq < ${PBS_NODEFILE} > Nodefile
mpirun -np 20 -machinefile Nodefile ./application
(I actually supply a script mpirun1, which does this along
with mpirun2, mpirun3, that supply 2, 3, etc per node for
two clusters that use vendor MPI's based upon MPICH.)
best of Luck,
James Coyle, PhD
High Performance Computing Group
Iowa State Univ.
web: http://jjc.public.iastate.edu/
>-----Original Message-----
>From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
>bounces at supercluster.org] On Behalf Of DuChene, StevenX A
>Sent: Thursday, October 27, 2011 10:48 AM
>To: Torque Users Mailing List
>Subject: Re: [torqueusers] torque not listening to ppn request specs
>
>Is it possible that there is some maui setting that could have an
>effect on packing processes on nodes (one per processor) rather than
>spreading them out across nodes (one per node)? Some "optimization"
>thing I need to turn off or on?
>--
>Steven DuChene
>
>-----Original Message-----
>From: DuChene, StevenX A
>Sent: Thursday, October 27, 2011 8:32 AM
>To: Torque Users Mailing List
>Subject: RE: [torqueusers] torque not listening to ppn request specs
>
>Ken:
>I tried that and my output file still shows that there are only 64
>unique hosts being used four times each instead of 256 hosts used 1
>time each. So as I said I am not getting the results out of the
>ppn=1 directive that I am expecting.
>--
>Steven DuChene
>
>-----Original Message-----
>From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
>bounces at supercluster.org] On Behalf Of Ken Nielson
>Sent: Wednesday, October 26, 2011 10:07 AM
>To: Torque Users Mailing List
>Subject: Re: [torqueusers] torque not listening to ppn request specs
>
>
>
>----- Original Message -----
>> From: "StevenX A DuChene" <stevenx.a.duchene at intel.com>
>> To: torqueusers at supercluster.org
>> Sent: Tuesday, October 25, 2011 6:10:13 PM
>> Subject: [torqueusers] torque not listening to ppn request specs
>>
>>
>>
>>
>>
>> Hello all:
>>
>> I have torque 2.5.7 and maui 3.2.6p21 installed on a couple of
>small
>> clusters and I am submitting the following mpi job using:
>>
>>
>>
>> qsub -l nodes=12:mynode:ppn=1 script_noarch.pbs
>>
>>
>>
>> this script is very simple as it only has one line in it to invoke
>> the call to mpirun
>>
>>
>>
>> mpirun --machinefile $PBS_NODEFILE
>> /home/myuser/mpi_test/mpi_hello_hostname
>>
>>
>>
>> The actual source to this is also very simple:
>>
>>
>>
>> #include <mpi.h>
>>
>> #include <stdio.h>
>>
>>
>>
>> int main(int argc, char **argv)
>>
>> {
>>
>> int *buf, i, rank, nints, len;
>>
>> char hostname[256];
>>
>>
>>
>> MPI_Init(&argc,&argv);
>>
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>
>> gethostname(hostname,255);
>>
>> printf("Hello world! I am process number: %d on host %s\n", rank,
>> hostname);
>>
>> MPI_Finalize();
>>
>> return 0;
>>
>> }
>>
>>
>>
>> When I run this with the ppn=1 specification I would expect one
>> processer per node spread over twelve nodes but when I look at my
>> output file I see it is running multiple processes per node
>instead.
>> So as a result I do not see the output from twelve unique nodes as
>I
>> would expect.
>>
>>
>>
>> My nodes file has the following sorts of entries:
>>
>>
>>
>> enode01 np=4 mynode
>>
>> enode02 np=4 mynode
>>
>> enode03 np=4 mynode
>>
>> enode04 np=4 mynode
>>
>> enode05 np=4 mynode
>>
>> enode06 np=4 mynode
>>
>> enode07 np=4 mynode
>>
>> enode08 np=4 mynode
>>
>> enode09 np=4 mynode
>>
>> enode10 np=4 mynode
>>
>> enode11 np=4 mynode
>>
>> enode12 np=4 mynode
>>
>>
>>
>> I know I can remove the np=4 from each node specification and get
>the
>> one process per node but I was under the impression that I could
>use
>> the ppn=1 or whatever to get the same thing.
>>
>>
>>
>> Am I misunderstanding or overlooking something?
>>
>> --
>>
>
>
>Steven,
>
>Try qsub -l nodes=12:ppn=1:mynode script_noarch.pbs
>
>Ken
>_______________________________________________
>torqueusers mailing list
>torqueusers at supercluster.org
>http://www.supercluster.org/mailman/listinfo/torqueusers
>_______________________________________________
>torqueusers mailing list
>torqueusers at supercluster.org
>http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list