[Mauiusers] Can´t get busy nodes

Fernando Caba fcaba at uns.edu.ar
Wed Sep 28 12:38:34 MDT 2011


Hi everybody, thanks for all answers.
I try all that you point out:

including
#PBS -l nodes=1:ppn=12

adding

JOBNODEMATCHPOLICY EXACTNODE

to maui.cfg

but nothing of this work. I´m thinking that the problem is in another 
config parameter (maui or torque).

I will reading more about all.

Thanks!!

----------------------------------------------------
Ing. Fernando Caba
Director General de Telecomunicaciones
Universidad Nacional del Sur
http://www.dgt.uns.edu.ar
Tel/Fax: (54)-291-4595166
Tel: (54)-291-4595101 int. 2050
Avda. Alem 1253, (B8000CPB) Bahía Blanca - Argentina
----------------------------------------------------


El 28/09/2011 12:33 PM, Gus Correa escribió:
> Hi Fernando
>
> Dennis already pointed out the first/main problem.
> Your Torque/PBS script is not requesting a specific number of nodes
> and cores/processors.
> You can ask for 12 processors, even if your MPI command doesn't
> use all of them:
>
> #PBS -l nodes=1:ppn=12
>
> [You can still do mpirun -np 8 if you want.]
>
> This will prevent two jobs to run in the same node [which seems
> to be your goal, if I understood it right].
>
> I like to add also the queue name [even if it is the default]
> and the job name [for documentation and stdout/stderr
> naming consistency]
>
> #PBS -q myqueue [whatever you called your queue]
> #PBS -N myjob [15 characters at most, the rest gets truncated]
>
> The #PBS clauses must be together and right after the #! /bin/sh line.
>
> Ask your users to always add these lines to their jobs.
> There is a feature of torque that allows you to write a wrapper
> that will whatever you want to the job script,
> but if your pool of users is small
> you can just ask them to cooperate.
>
> Of course there is much more that you can add.
> 'man qsub' and 'man pbs_resources' are good sources of information,
> highly recommended reading.
>
>
> Then there is what Antonio Messina mentioned, the cpuset feature
> of Torque.
> I don't know if you installed Torque with this feature enabled.
> However, if you did, it will allow the specific cores to be
> assigned to each process, which could allow node-sharing without
> jobs stepping on each other toes.
> However:
> A) this requires a bit more of setup [not a lot, check the
> list archives and the Torque Admin Guide]
> B) if your users are cooperative and request 12 processors for each job,
> and you're using the Maui 'JOBNODEMATCHPOLICY EXACTNODE' each job will
> get to a single node anyway.
>
> BTW, did you restart Maui after you added 'JOBNODEMATCHPOLICY EXACTNODE'
> to the maui.cfg file?
>
> I hope this helps,
> Gus Correa
>
>
> Fernando Caba wrote:
>> Hi Gus, my node file /var/spool/torque /server_priv/nodes looks like:
>>
>> [root at fe server_priv]# more nodes
>> n10 np=12
>> n11 np=12
>> n12 np=12
>> n13 np=12
>> [root at fe server_priv]#
>>
>> it is exact as your comment.
>>
>> My script:
>>
>> #!/bin/bash
>>
>> cd $PBS_O_WORKDIR
>>
>> mpirun -np 8 /usr/local/vasp/vasp
>>
>> launch 8 vasp in one node. If i start one job more (with -np 8),
>> the job will run in the same node (n13).
>> So if i start another job with -np 8
>> (or -np 4), it will run in the same node n13.
>>
>> I configured JOBNODEMATCHPOLICY EXACTNODE in maui.cfg,
>> but unfortunately the ran in node n13.
>> This is an example of the output of top
>>
>> top - 00:05:53 up 14 days,  6:47,  1 user,  load average: 4.18, 4.06, 4.09
>> Mem:  15955108k total, 13287888k used,  2667220k free,   142168k buffers
>> Swap: 67111528k total,    16672k used, 67094856k free, 11360332k cached
>>
>>     PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 21796 patricia  25   0  463m 291m  12m R 100.5  1.9 517:29.59 vasp
>> 21797 patricia  25   0  448m 276m  11m R 100.2  1.8 518:51.49 vasp
>> 21798 patricia  25   0  458m 287m  11m R 100.2  1.8 522:01.79 vasp
>> 21799 patricia  25   0  448m 276m  11m R 99.9  1.8 519:04.25 vasp
>>       1 root      15   0 10348  672  568 S  0.0  0.0   0:00.53 init
>>       2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.06 migration/0
>>       3 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
>>       4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
>>       5 root      RT  -5     0    0    0 S  0.0  0.0   0:00.04 migration/1
>>
>> The job that generate those 4 vasp process is:
>>
>> #!/bin/bash
>>
>> cd $PBS_O_WORKDIR
>>
>> mpirun -np 4 /usr/local/vasp/vasp
>>
>> Thanks
>>
>> ----------------------------------------------------
>> Ing. Fernando Caba
>> Director General de Telecomunicaciones
>> Universidad Nacional del Sur
>> http://www.dgt.uns.edu.ar
>> Tel/Fax: (54)-291-4595166
>> Tel: (54)-291-4595101 int. 2050
>> Avda. Alem 1253, (B8000CPB) Bahía Blanca - Argentina
>> ----------------------------------------------------
>>
>>
>> El 27/09/2011 08:07 PM, Gus Correa escribió:
>>> Hi Fernando
>>>
>>> Did you try something like this in your
>>> ${TORQUE}/server_priv/nodes file?
>>>
>>> frontend np=12 [skip this line if the frontend is not to do job work]
>>> node1 np=12
>>> node2 np=12
>>> node3 np=12
>>> node4 np=12
>>>
>>> This is probably the first thing to do.
>>> It is not Maui, just plain Torque [actually pbs_server configuration].
>>>
>>> The lines above assume your nodes are called node1, ...
>>> and the head node is called frontend,
>>> in some name-resolvable manner [most likely
>>> in your /etc/hosts file, most likely pointing to the nodes'
>>> IP addresses in your cluster's private subnet, 192.168.X.X,
>>> 10.X.X.X or equivalent].
>>>
>>> The 'np=12' clause will allow at most 12 *processes* per node.
>>>
>>>
>>> [However, if VASP is *threaded*, say via OpenMP, then it won't
>>> prevent that several threads are launched from each process.
>>> To handle threaded you can use some tricks, such as requesting
>>> more cores than processes.
>>> Sorry, I am not familiar to VASP to be able to say more than this.]
>>>
>>> I would suggest that you take a look at the Torque Admin Manual
>>> for more details:
>>> http://www.adaptivecomputing.com/resources/docs/torque/
>>>
>>> There are further controls in Maui, such as
>>> 'JOBNODEMATCHPOLICY EXACTNODE' in maui.cfg,
>>> for instance, if you want full nodes allocated to each job,
>>> as opposed to jobs sharing cores in a single node.
>>> However, these choices may come later.
>>> [You can change maui.cfg and restart the maui scheduler to
>>> test various changes.]
>>>
>>> For Maui details see the Maui Admin Guide:
>>> http://www.adaptivecomputing.com/resources/docs/maui/index.php
>>>
>>> I hope this helps,
>>> Gus Correa
>>>
>>> Fernando Caba wrote:
>>>> Hi every body, i am using torque 3.0.1 and maui 3.3.1 in a configuration
>>>> composed by a front end and 4 nodes (2 processors, 6 cores each)
>>>> totalizing 48 cores.
>>>> I need to configure that in each node don´t run no more than 12 process
>>>> (particular we are using vasp), so we wan´t no more than 12 vasp process
>>>> by node.
>>>> How can i configure this? I´m so confusing reading a lot of information
>>>> from torque and maui configuration.
>>>>
>>>> Thank´s in advance.
>>>>
>>> _______________________________________________
>>> mauiusers mailing list
>>> mauiusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>>
>> _______________________________________________
>> mauiusers mailing list
>> mauiusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/mauiusers
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>


More information about the mauiusers mailing list