[Mauiusers] Questions regarding "tasks" and the cfg file.

Al Taufer ataufer at clusterresources.com
Wed Sep 17 14:48:07 MDT 2008


Michael,

I don't know what version of Torque you are using, but there was a 
change made in mid May of this year.  The older versions would use the 
values specified by resources_max as the default if there was no 
resources_default value specified on the queue or server and none 
specified at qsub time.

On versions of Torque dated later then mid May this behavior can be 
changed by using the --enable-maxnotdefault option when you do the 
configure.

On earlier versions, either set a resources_default or remove the 
resources_max setting.

Al

Michael Homa wrote:
> On Wed, 17 Sep 2008, Garrick Staples wrote:
>
>   
>> On Wed, Sep 17, 2008 at 11:12:41AM -0500, Michael Homa alleged:
>>     
>>> In order to test, I wrote a simple hello_world program in C. When I submit
>>> the program for execution, I see that the number of tasks is 6:
>>>
>>> Job ID    Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
>>> 317       mhoma    dedicate hello_worl    --   1     6    --- 00:30  Q --
>>>       
>> What did the job actually request?  nodes=1:ppn=6?  ncpus=6?  Neither of those
>> requests can be answered with quad-proc machines.
>>     
>
> Hi Garrick:
>
> I didn't have a "-l nodes" option in the script:
>
>   #PBS -N hello_world
>   #PBS -q dedicated
>   /home/homes51/mhoma/a.out
>
> and did not specify a -l on qsub (qsub script). When I add the -l option:
>
>   #PBS -N hello_world
>   #PBS -q dedicated
>   #PBS -l nodes=1:ppn=1
>   /home/homes51/mhoma/a.out
>
> I get the same result:
>
>   checking job 322
>
>   State: Idle
>   Creds:  user:mhoma  group:users  class:dedicated  qos:DEFAULT
>   WallTime: 00:00:00 of 00:30:00
>   SubmitTime: Wed Sep 17 14:45:29
>     (Time Queued  Total: 00:01:41  Eligible: 00:01:41)
>
>   Total Tasks: 1
>
>   Req[0]  TaskCount: 1  Partition: ALL
>   Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
>   Opsys: [NONE]  Arch: [NONE]  Features: [dedicated]
>   Dedicated Resources Per Task: PROCS: 6   <----------- I find this interesting
>                                                         but where is it
>                                                         getting it.
>   ...
>   PE:  6.00  StartPriority:  1
>   job cannot run in partition DEFAULT (idle procs do not meet requirements :
>   0 of 6 procs found)
>   idle procs:  28  feasible procs:   0
>
> The only place I figure it may come from is the torque configuration
> for the dedicated queue:
>
>         resources_max.ncpus = 6
>
> But my understand from reading the queue configuration guide (and feel
> free to tell me I'm full of crap) is that resources_max.ncpus is the
> maximum number of processors a single job can request in the queue and not
> the default number of processors allocated per job if the user does not
> include "-l node" argument.
>
>   
>>> The dedicated queue has three dual CPU, dual cores and was established in
>>> torque:
>>>
>>>   argo17-1 np=4 Linux2.i86pc dualcore amd smp dedicated
>>>   argo18-2 np=4 Linux2.i86pc dualcore amd smp dedicated
>>>   argo18-3 np=4 Linux2.i86pc dualcore amd smp dedicated
>>>       
>
> I've always wanted to ask this question. Does the np refer to "real,
> physical processors" or does it refer to the total number of cores?
> If the former, then argo17-1 should be:
>   argo17-1 np=2:ppn=2 Linux2.i86pc dualcore amd smp dedicated
>
> If the latter, then:
>   argo17-1 np=4 Linux2.i86pc dualcore amd smp dedicated
> is correct
>
>   
>> Don't change the number of CPUs in a task.  Down that road lies madness.
>>     
>
> ok. Technically "done that road lies more madness."
>                                      ----
>
>   
>>>    2) I'm unclear as to how the "task" number is derived? I noticed that
>>>       my hello_world has a PE of 6. Is that a coincidence or does the
>>>       resulting PE become the number of tasks? Why six processors for
>>>       hello_world?
>>>       
>> We would need to see that actual request.
>>     
>
> I'm not being funny but how does one get the request. From the checkjob
> command?
>
> Michael
>
> And, I don't want to forget to say, thank you for your help.
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>   



More information about the mauiusers mailing list