[torqueusers] torque not listening to ppn request specs

O'Bryant, Pat pat.o'bryant at exxonmobil.com
Thu Oct 27 13:05:40 MDT 2011


Steven,
   How about this:
qsub -l nodes=256:SeaMicro,tpn=1   script_noarch.pbs

Note that "tpn" means tasks per node. Note that it has a comma preceding it. The "ppn" value can be hard to figure out.

-----Original Message-----
From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of DuChene, StevenX A
Sent: Thursday, October 27, 2011 1:39 PM
To: Torque Users Mailing List
Subject: Re: [torqueusers] torque not listening to ppn request specs

I am looking at the torque documentation for the qsub command. Specifically I am looking at what it says about the -W option. The documentation lists several additional attributes that torque supports but I do not see any mention of a x=blah in that list of supported additional attributes.

I have tried this method of taking the "JOBNODEMATCHPOLICY EXACTNODE" out of the maui.cfg file and then doing my qsub with the following:

qsub -l nodes=256:SeaMicro:ppn=1 -W x=nmatchpolicy:exactnode  script_noarch.pbs

but I do not get the 256 unique nodes I asked for. I am back to 64 nodes with four processes packed per node.
--
Steven DuChene

-----Original Message-----
From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Lloyd Brown
Sent: Thursday, October 27, 2011 10:26 AM
To: torqueusers at supercluster.org
Subject: Re: [torqueusers] torque not listening to ppn request specs

Well, there are a couple of approaches, at least with Moab (again, never
actually used Maui; YMMV):

- Make packing the default (unset JOBNODEMATCHPOLICY), and append the
"-W x=nmatchpolicy:exactnode" to the job, either as a parameter to qsub,
or as a "#PBS -w x=...." line in your script

- Make exactnode the default, and have people who don't care about the
exact layout use the "procs=x" syntax, instead of the "nodes=x:ppn=y"
syntax.

Again, not sure if these work with Maui, but they're worth a try.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

On 10/27/2011 11:18 AM, DuChene, StevenX A wrote:
> Cool! Thanks Lloyd! That seems to have done the trick. I got 256 unique nodes this time instead of 64.
> 
> However does setting this policy in my maui.cfg file mean it will never pack processes if that is actually what a user intends?
> --
> Steven DuChene
> 
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Lloyd Brown
> Sent: Thursday, October 27, 2011 10:08 AM
> To: torqueusers at supercluster.org
> Subject: Re: [torqueusers] torque not listening to ppn request specs
> 
> Steve,
> 
> I'm not a Maui expert (we use Moab), but it sounds like this is an
> optimization by the scheduler.  In the end, Torque just does what the
> scheduler tells it to, so if it's being told to consolidate down to 64
> nodes, then it will happily do so.
> 
> Looking at the Maui docs, though, it does seem like the
> JOBNODEMATCHPOLICY has been carried over from Moab.  What happens if you
> put something like the following in your Maui config:
> 
>> JOBNODEMATCHPOLICY EXACTNODE
> 
> 
> 
> Lloyd Brown
> Systems Administrator
> Fulton Supercomputing Lab
> Brigham Young University
> http://marylou.byu.edu
> 
> On 10/27/2011 10:53 AM, DuChene, StevenX A wrote:
>> Thanks to all how are reading and responding to my pleas for assistance or guidance.
>>
>> We are a benchmarking center and I have a user who wants to start up his benchmark process across all 256 nodes, one process per node. Yes, right now I am using openmpi but later today I need to try all of this with the Intel MPI implementation.
>>
>> I tried doing the following:
>>
>> $(PBS_NODEFILE) > /home/myuser/mpi_test/cruddy256
>> mpirun --machinefile $PBS_NODEFILE /home/myuser/mpi_test/mpi_hello_hostname
>>
>> so I could try examining the nodefile I am getting from torque but all I get is a zero length file.
>>
>> I looked in my torque accounting logs and I see things in the execution host list of:
>>
>> exec_host=eatom255/3+eatom255/2+eatom255/1+eatom255/0+eatom254/3+eatom254/2+eatom254/1+eatom254/0+eatom253/3+eatom253/2+eatom253/1+eatom253/0
>>
>> I copied this exec_host= stuff to a separate file and did some text munging and I only see 64 unique hosts being allocated by torque.
>>
>> So does that mean torque is screwing me over or could it still be some optimization being done by maui that is running as the scheduler above the torque pbs_server process?
>> --
>> Steven DuChene
>>
>> -----Original Message-----
>> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Coyle, James J [ITACD]
>> Sent: Thursday, October 27, 2011 9:19 AM
>> To: Torque Users Mailing List
>> Subject: Re: [torqueusers] torque not listening to ppn request specs
>>
>> Steve,
>>
>>   If this is a question just of design and not of use, 
>> ignore the following:
>>
>>
>>
>> Getting what you want, 1 processor on N nodes.
>>
>> Possibilities:
>> 1)  One possibility is to try:
>>
>> qmgr -c 'set server node_pack = False'
>>
>> (I think that the default setup is True, which is
>> what I want and use, this keeps nodes more free.)
>> I don't know if that will give you the behavior that 
>> you want, but it does try to launch jobs on separate
>> nodes.
>>
>> 2) Use nodes=20:ppn=4 and use --bynode option if you are using 
>> OpenMPI (which is what I advise users here) or 
>> if you are using another implementation  of MPI that does 
>> not support --bynode or something similar, issue
>>
>> uniq < ${PBS_NODEFILE} > Nodefile
>> mpirun -np 20 -machinefile Nodefile  ./application
>>
>> (I actually supply a script mpirun1, which does this along
>> with mpirun2, mpirun3, that supply 2, 3, etc per node for 
>> two clusters that use vendor MPI's based upon MPICH.)
>>
>> best of Luck,
>> James Coyle, PhD
>>  High Performance Computing Group        
>>  Iowa State Univ.          
>> web: http://jjc.public.iastate.edu/
>>
>>> -----Original Message-----
>>> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
>>> bounces at supercluster.org] On Behalf Of DuChene, StevenX A
>>> Sent: Thursday, October 27, 2011 10:48 AM
>>> To: Torque Users Mailing List
>>> Subject: Re: [torqueusers] torque not listening to ppn request specs
>>>
>>> Is it possible that there is some maui setting that could have an
>>> effect on packing processes on nodes (one per processor) rather than
>>> spreading them out across nodes (one per node)? Some "optimization"
>>> thing I need to turn off or on?
>>> --
>>> Steven DuChene
>>>
>>> -----Original Message-----
>>> From: DuChene, StevenX A
>>> Sent: Thursday, October 27, 2011 8:32 AM
>>> To: Torque Users Mailing List
>>> Subject: RE: [torqueusers] torque not listening to ppn request specs
>>>
>>> Ken:
>>> I tried that and my output file still shows that there are only 64
>>> unique hosts being used four times each instead of 256 hosts used 1
>>> time each. So as I said I am not getting the results out of the
>>> ppn=1 directive that I am expecting.
>>> --
>>> Steven DuChene
>>>
>>> -----Original Message-----
>>> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
>>> bounces at supercluster.org] On Behalf Of Ken Nielson
>>> Sent: Wednesday, October 26, 2011 10:07 AM
>>> To: Torque Users Mailing List
>>> Subject: Re: [torqueusers] torque not listening to ppn request specs
>>>
>>>
>>>
>>> ----- Original Message -----
>>>> From: "StevenX A DuChene" <stevenx.a.duchene at intel.com>
>>>> To: torqueusers at supercluster.org
>>>> Sent: Tuesday, October 25, 2011 6:10:13 PM
>>>> Subject: [torqueusers] torque not listening to ppn request specs
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hello all:
>>>>
>>>> I have torque 2.5.7 and maui 3.2.6p21 installed on a couple of
>>> small
>>>> clusters and I am submitting the following mpi job using:
>>>>
>>>>
>>>>
>>>> qsub -l nodes=12:mynode:ppn=1 script_noarch.pbs
>>>>
>>>>
>>>>
>>>> this script is very simple as it only has one line in it to invoke
>>>> the call to mpirun
>>>>
>>>>
>>>>
>>>> mpirun --machinefile $PBS_NODEFILE
>>>> /home/myuser/mpi_test/mpi_hello_hostname
>>>>
>>>>
>>>>
>>>> The actual source to this is also very simple:
>>>>
>>>>
>>>>
>>>> #include <mpi.h>
>>>>
>>>> #include <stdio.h>
>>>>
>>>>
>>>>
>>>> int main(int argc, char **argv)
>>>>
>>>> {
>>>>
>>>> int *buf, i, rank, nints, len;
>>>>
>>>> char hostname[256];
>>>>
>>>>
>>>>
>>>> MPI_Init(&argc,&argv);
>>>>
>>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>
>>>> gethostname(hostname,255);
>>>>
>>>> printf("Hello world! I am process number: %d on host %s\n", rank,
>>>> hostname);
>>>>
>>>> MPI_Finalize();
>>>>
>>>> return 0;
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>> When I run this with the ppn=1 specification I would expect one
>>>> processer per node spread over twelve nodes but when I look at my
>>>> output file I see it is running multiple processes per node
>>> instead.
>>>> So as a result I do not see the output from twelve unique nodes as
>>> I
>>>> would expect.
>>>>
>>>>
>>>>
>>>> My nodes file has the following sorts of entries:
>>>>
>>>>
>>>>
>>>> enode01 np=4 mynode
>>>>
>>>> enode02 np=4 mynode
>>>>
>>>> enode03 np=4 mynode
>>>>
>>>> enode04 np=4 mynode
>>>>
>>>> enode05 np=4 mynode
>>>>
>>>> enode06 np=4 mynode
>>>>
>>>> enode07 np=4 mynode
>>>>
>>>> enode08 np=4 mynode
>>>>
>>>> enode09 np=4 mynode
>>>>
>>>> enode10 np=4 mynode
>>>>
>>>> enode11 np=4 mynode
>>>>
>>>> enode12 np=4 mynode
>>>>
>>>>
>>>>
>>>> I know I can remove the np=4 from each node specification and get
>>> the
>>>> one process per node but I was under the impression that I could
>>> use
>>>> the ppn=1 or whatever to get the same thing.
>>>>
>>>>
>>>>
>>>> Am I misunderstanding or overlooking something?
>>>>
>>>> --
>>>>
>>>
>>>
>>> Steven,
>>>
>>> Try qsub -l nodes=12:ppn=1:mynode script_noarch.pbs
>>>
>>> Ken
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list