[torqueusers] how i configure torque and maui to submit serial job to diffrent nodes???

Donald Tripp dtripp at hawaii.edu
Wed Dec 6 01:40:56 MST 2006


I think the problem is in Maui not torque.  In /usr/loca/maui/ 
maui.cfg I have:

NODEACCESSPOLICY	SINGLEJOB

I think that is what you want. You want your nodes to be job  
exclusive, that is one job per node (but if a job used multiple cpus,  
then thats ok).


- Donald Tripp
   dtripp at hawaii.edu
----------------------------------------------
HPC Systems Administrator
High Performance Computing Center
University of Hawai'i at Hilo
200 W. Kawili Street
Hilo,   Hawaii   96720
http://www.hpc.uhh.hawaii.edu


On Dec 5, 2006, at 10:15 PM, lars at hesdorf.dk wrote:

> You need a change in the file
> /var/spool/PBS/sched_priv/sched_config
>
> For other reason as well time is well spend reading this file.
>
> #
> # smp_cluster_dist
> #
> #       This option allows you to decide how to distribute jobs to  
> all the
> #       nodes on your systems.
> #
> #       pack        - pack as many jobs onto a node that will fit  
> before
> #                     running on another node
> #       round_robin - run one job on each node in a cycle
> #       lowest_load - run the job on the lowest loaded node
> #
> #       PRIME OPTION
>
> ## smp_cluster_dist: pack
> smp_cluster_dist: round_robin
>
> This is taken from our PBSpro installation, by I think Torque has  
> the same
> (haven't chacked yet).
>
>
>> when i submit serial job to the cluster by torque+maui,the job  
>> always run
>> at the same node untill  all the cpus of that node is used .
>> for example ,I use "lx" account to submit serial job "dfdf", every  
>> nodes
>> have two cpus.and now every cpu is free,no job is running.
>> [lx at console ~]$ qsub -l nodes=1:ppn=1 dfdf
>> 101.console
>> [lx at console ~]$ qstat -an
>>
>> console:
>>                                                                     
>> Req'd
>> Req'd   Elap
>> Job ID               Username Queue    Jobname    SessID NDS   TSK  
>> Memory
>> Time  S Time
>> -------------------- -------- -------- ---------- ------ ----- ---  
>> ------
>> ----- - -----
>> 101.console          lx       dpool    dfdf         4012     1   
>> --    --
>> --  R   --
>>    c1501/0
>> [lx at console ~]$ qsub -l nodes=1:ppn=1 dfdf
>> 102.console
>> [lx at console ~]$ qstat -an
>>
>> console:
>>                                                                     
>> Req'd
>> Req'd   Elap
>> Job ID               Username Queue    Jobname    SessID NDS   TSK  
>> Memory
>> Time  S Time
>> -------------------- -------- -------- ---------- ------ ----- ---  
>> ------
>> ----- - -----
>> 101.console          lx       dpool    dfdf         4012     1   
>> --    --
>> --  R   --
>>    c1501/0
>> 102.console          lx       dpool    dfdf         4102     1   
>> --    --
>> --  R   --
>>    c1501/1
>> [lx at console ~]$ qsub -l nodes=1:ppn=1 dfdf
>> 103.console
>> [lx at console ~]$ qstat -an
>>
>> console:
>>                                                                     
>> Req'd
>> Req'd   Elap
>> Job ID               Username Queue    Jobname    SessID NDS   TSK  
>> Memory
>> Time  S Time
>> -------------------- -------- -------- ---------- ------ ----- ---  
>> ------
>> ----- - -----
>> 101.console          lx       dpool    dfdf         4012     1   
>> --    --
>> --  R   --
>>    c1501/0
>> 102.console          lx       dpool    dfdf         4012     1   
>> --    --
>> --  R   --
>>    c1501/1
>> 103.console          lx       dpool    dfdf         3543     1   
>> --    --
>> --  R   --
>>    c1503/0
>>
>> as you see,the first two jobs are running at the same node.this is  
>> not
>> load
>> balance.i want the job 102 run at the other node not the node  
>> c1501.after
>> all the cpus of the node c1501 is used,the job 103 is starting to   
>> run at
>> the other node c1503.
>> i have configured the torque server by using "node_pack=false",but  
>> it not
>> works.
>> and i also configure the maui.cfg file ,adding "NODEALLOCATIONPOLICY
>> MAXBALANCE
>> NODEACCESSPOLICY SINGLEUSER",but it still not works.
>>
>> i am very disappointed,how can i do .
>>
>> this is my server's configuration.
>> #
>> # Create queues and set their attributes.
>> #
>> #
>> # Create and define queue dpool
>> #
>> create queue dpool
>> set queue dpool queue_type = Execution
>> set queue dpool max_queuable = 50
>> set queue dpool max_running = 50
>> set queue dpool resources_default.neednodes = dpool
>> set queue dpool enabled = True
>> set queue dpool started = True
>> #
>> # Set server attributes.
>> #
>> set server scheduling = True
>> set server acl_host_enable = False
>> set server managers = root at console
>> set server operators = root at console
>> set server default_queue = dpool
>> set server log_events = 127
>> set server mail_from = adm
>> set server scheduler_iteration = 300
>> set server node_check_rate = 150
>> set server tcp_timeout = 6
>> set server node_pack = False
>> set server torque_version = 2.0.0p8
>>
>> _________________________________________________________________
>> Ãâ·ÑÏÂÔØ MSN Explorer:   http://explorer.msn.com/lccn/
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20061205/be576b82/attachment-0001.html


More information about the torqueusers mailing list