[torqueusers] torque not listening to ppn request specs

DuChene, StevenX A stevenx.a.duchene at intel.com
Thu Oct 27 11:18:46 MDT 2011


Cool! Thanks Lloyd! That seems to have done the trick. I got 256 unique nodes this time instead of 64.

However does setting this policy in my maui.cfg file mean it will never pack processes if that is actually what a user intends?
--
Steven DuChene

-----Original Message-----
From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Lloyd Brown
Sent: Thursday, October 27, 2011 10:08 AM
To: torqueusers at supercluster.org
Subject: Re: [torqueusers] torque not listening to ppn request specs

Steve,

I'm not a Maui expert (we use Moab), but it sounds like this is an
optimization by the scheduler.  In the end, Torque just does what the
scheduler tells it to, so if it's being told to consolidate down to 64
nodes, then it will happily do so.

Looking at the Maui docs, though, it does seem like the
JOBNODEMATCHPOLICY has been carried over from Moab.  What happens if you
put something like the following in your Maui config:

> JOBNODEMATCHPOLICY EXACTNODE



Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

On 10/27/2011 10:53 AM, DuChene, StevenX A wrote:
> Thanks to all how are reading and responding to my pleas for assistance or guidance.
> 
> We are a benchmarking center and I have a user who wants to start up his benchmark process across all 256 nodes, one process per node. Yes, right now I am using openmpi but later today I need to try all of this with the Intel MPI implementation.
> 
> I tried doing the following:
> 
> $(PBS_NODEFILE) > /home/myuser/mpi_test/cruddy256
> mpirun --machinefile $PBS_NODEFILE /home/myuser/mpi_test/mpi_hello_hostname
> 
> so I could try examining the nodefile I am getting from torque but all I get is a zero length file.
> 
> I looked in my torque accounting logs and I see things in the execution host list of:
> 
> exec_host=eatom255/3+eatom255/2+eatom255/1+eatom255/0+eatom254/3+eatom254/2+eatom254/1+eatom254/0+eatom253/3+eatom253/2+eatom253/1+eatom253/0
> 
> I copied this exec_host= stuff to a separate file and did some text munging and I only see 64 unique hosts being allocated by torque.
> 
> So does that mean torque is screwing me over or could it still be some optimization being done by maui that is running as the scheduler above the torque pbs_server process?
> --
> Steven DuChene
> 
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Coyle, James J [ITACD]
> Sent: Thursday, October 27, 2011 9:19 AM
> To: Torque Users Mailing List
> Subject: Re: [torqueusers] torque not listening to ppn request specs
> 
> Steve,
> 
>   If this is a question just of design and not of use, 
> ignore the following:
> 
> 
> 
> Getting what you want, 1 processor on N nodes.
> 
> Possibilities:
> 1)  One possibility is to try:
> 
> qmgr -c 'set server node_pack = False'
> 
> (I think that the default setup is True, which is
> what I want and use, this keeps nodes more free.)
> I don't know if that will give you the behavior that 
> you want, but it does try to launch jobs on separate
> nodes.
> 
> 2) Use nodes=20:ppn=4 and use --bynode option if you are using 
> OpenMPI (which is what I advise users here) or 
> if you are using another implementation  of MPI that does 
> not support --bynode or something similar, issue
> 
> uniq < ${PBS_NODEFILE} > Nodefile
> mpirun -np 20 -machinefile Nodefile  ./application
> 
> (I actually supply a script mpirun1, which does this along
> with mpirun2, mpirun3, that supply 2, 3, etc per node for 
> two clusters that use vendor MPI's based upon MPICH.)
> 
> best of Luck,
> James Coyle, PhD
>  High Performance Computing Group        
>  Iowa State Univ.          
> web: http://jjc.public.iastate.edu/
> 
>> -----Original Message-----
>> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
>> bounces at supercluster.org] On Behalf Of DuChene, StevenX A
>> Sent: Thursday, October 27, 2011 10:48 AM
>> To: Torque Users Mailing List
>> Subject: Re: [torqueusers] torque not listening to ppn request specs
>>
>> Is it possible that there is some maui setting that could have an
>> effect on packing processes on nodes (one per processor) rather than
>> spreading them out across nodes (one per node)? Some "optimization"
>> thing I need to turn off or on?
>> --
>> Steven DuChene
>>
>> -----Original Message-----
>> From: DuChene, StevenX A
>> Sent: Thursday, October 27, 2011 8:32 AM
>> To: Torque Users Mailing List
>> Subject: RE: [torqueusers] torque not listening to ppn request specs
>>
>> Ken:
>> I tried that and my output file still shows that there are only 64
>> unique hosts being used four times each instead of 256 hosts used 1
>> time each. So as I said I am not getting the results out of the
>> ppn=1 directive that I am expecting.
>> --
>> Steven DuChene
>>
>> -----Original Message-----
>> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
>> bounces at supercluster.org] On Behalf Of Ken Nielson
>> Sent: Wednesday, October 26, 2011 10:07 AM
>> To: Torque Users Mailing List
>> Subject: Re: [torqueusers] torque not listening to ppn request specs
>>
>>
>>
>> ----- Original Message -----
>>> From: "StevenX A DuChene" <stevenx.a.duchene at intel.com>
>>> To: torqueusers at supercluster.org
>>> Sent: Tuesday, October 25, 2011 6:10:13 PM
>>> Subject: [torqueusers] torque not listening to ppn request specs
>>>
>>>
>>>
>>>
>>>
>>> Hello all:
>>>
>>> I have torque 2.5.7 and maui 3.2.6p21 installed on a couple of
>> small
>>> clusters and I am submitting the following mpi job using:
>>>
>>>
>>>
>>> qsub -l nodes=12:mynode:ppn=1 script_noarch.pbs
>>>
>>>
>>>
>>> this script is very simple as it only has one line in it to invoke
>>> the call to mpirun
>>>
>>>
>>>
>>> mpirun --machinefile $PBS_NODEFILE
>>> /home/myuser/mpi_test/mpi_hello_hostname
>>>
>>>
>>>
>>> The actual source to this is also very simple:
>>>
>>>
>>>
>>> #include <mpi.h>
>>>
>>> #include <stdio.h>
>>>
>>>
>>>
>>> int main(int argc, char **argv)
>>>
>>> {
>>>
>>> int *buf, i, rank, nints, len;
>>>
>>> char hostname[256];
>>>
>>>
>>>
>>> MPI_Init(&argc,&argv);
>>>
>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>
>>> gethostname(hostname,255);
>>>
>>> printf("Hello world! I am process number: %d on host %s\n", rank,
>>> hostname);
>>>
>>> MPI_Finalize();
>>>
>>> return 0;
>>>
>>> }
>>>
>>>
>>>
>>> When I run this with the ppn=1 specification I would expect one
>>> processer per node spread over twelve nodes but when I look at my
>>> output file I see it is running multiple processes per node
>> instead.
>>> So as a result I do not see the output from twelve unique nodes as
>> I
>>> would expect.
>>>
>>>
>>>
>>> My nodes file has the following sorts of entries:
>>>
>>>
>>>
>>> enode01 np=4 mynode
>>>
>>> enode02 np=4 mynode
>>>
>>> enode03 np=4 mynode
>>>
>>> enode04 np=4 mynode
>>>
>>> enode05 np=4 mynode
>>>
>>> enode06 np=4 mynode
>>>
>>> enode07 np=4 mynode
>>>
>>> enode08 np=4 mynode
>>>
>>> enode09 np=4 mynode
>>>
>>> enode10 np=4 mynode
>>>
>>> enode11 np=4 mynode
>>>
>>> enode12 np=4 mynode
>>>
>>>
>>>
>>> I know I can remove the np=4 from each node specification and get
>> the
>>> one process per node but I was under the impression that I could
>> use
>>> the ppn=1 or whatever to get the same thing.
>>>
>>>
>>>
>>> Am I misunderstanding or overlooking something?
>>>
>>> --
>>>
>>
>>
>> Steven,
>>
>> Try qsub -l nodes=12:ppn=1:mynode script_noarch.pbs
>>
>> Ken
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list