[torqueusers] only one processor is used when using qsub -l procs flag
Gustavo Correa
gus at ldeo.columbia.edu
Mon Jan 16 08:50:09 MST 2012
PS - Hi Xiangqian.
Maybe you need to add this line to your maui.cfg [and restart maui],
for the 'proc=Z' syntax to work as you expect:
JOBNODEMATCHPOLICY EXACTNODE
I *think* the default is
JOBNODEMATCHPOLICY EXACTPROC
which expects your node to have the exact number of processors you requested [i.e. 3].
See appendix F of the Maui Admininstrator Guide for details.
I am not sure, but my recollection is that somebody reported a problem similar to yours
in the list before, and the solution suggested was this one.
I hope this helps,
Gus Correa
On Jan 16, 2012, at 10:21 AM, Gustavo Correa wrote:
> Hi Xiangqian
>
> For what it is worth, I use Maui 3.2.6p21, and I don't have the problem you described.
> I don't know the behavior in Maui 3.3.1, but as you reported 3.2.6p1 also works correctly for you,
> with the nodes-1:ppn=3 syntax.
> I am happy with 3.2.6p21.
>
> There is still a chance that a change in maui.cfg 3.3.1 may fix this glitch,
> but I don't know what it would be. Most likely it has to do with the node allocation policies,
> and how it translates 'procs' into nodes and ppn.
> Somebody else more savvy in the list may clarify this point.
>
> I confess I prefer the more detailed syntax 'nodes=X:ppn=Y',
> because it specifies more detail about the resources you are requesting,
> and apparently avoids the issue that hit you.
>
> Have you tried the 'nodes=1:ppn=3' syntax in Maui 3.3.1?
> I wonder if it would work there too.
>
> I hope this helps,
> Gus Correa
>
>
> On Jan 16, 2012, at 1:43 AM, Xiangqian Wang wrote:
>
>> thanks, Gustavo
>>
>> sorry for the misspelling in the previous email, i recheck it and correct it as following:
>>
>> i tested torque 2.5.8 and maui 3.3.1 on a centos 6.0 node, the job script is:
>>
>> #!/bin/sh
>> #PBS -N procsjob
>> #PBS -l procs=3
>> #PBS -q batch
>> ping localhost -c 100
>>
>> and qstat output "exec_host = snode02/0".
>> i replace with the new job script, as
>>
>> #!/bin/sh
>> #PBS -N procsjob
>> #PBS -l nodes=1:ppn=3
>> #PBS -q batch
>> ping localhost -c 100
>> and qstat output "exec_host = snode02/2+snode02/1+snode02/0".
>>
>> i change maui 3.3.1 to maui 3.2.6p21 and test again, qstat output "exec_host = snode02/2+snode02/1+snode02/0" for both script. maybe it's a maui 3.3.1 problem?
>>
>>
>> 2012/1/14 Gustavo Correa <gus at ldeo.columbia.edu>
>> Hi Xiangqian
>>
>> Is it a typo in your email or did you comment out this line in your Torque/PBS script?
>> [Note the double hash ##.]
>>
>>> ##PBS -l procs=3
>>
>> Have you tried this form instead?
>>
>> #PBS -l nodes=1:ppn=3
>>
>> For more details check 'man qsub' and 'man pbs_resources'.
>>
>> I hope it helps,
>> Gus Correa
>>
>> On Jan 13, 2012, at 4:10 AM, Xiangqian Wang wrote:
>>
>>> my demo torque+maui cluster has one node with np=4 set fot it. i want to submit a job requesting 3 processors, but when it start to run, i see only one processor is used (qstat shows "exec_host = snode02/0").
>>>
>>> i use torque 2.5.6 and maui 3.3.1. anyone can help me out, it'll be greatly appreciated
>>>
>>> the submit script is something like:
>>>
>>> #!/bin/sh
>>> #PBS -N procsjob
>>> ##PBS -l procs=3
>>> #PBS -q batch
>>> the output of checkjob is :
>>>
>>> checking job 33
>>> State: Running
>>> Creds: user:wangxq group:wangxq class:batch qos:DEFAULT
>>> WallTime: 00:00:00 of 1:00:00
>>> SubmitTime: Fri Jan 13 17:07:43
>>> (Time Queued Total: 00:00:01 Eligible: 00:00:01)
>>> StartTime: Fri Jan 13 17:07:44
>>> Total Tasks: 1
>>> Req[0] TaskCount: 1 Partition: DEFAULT
>>> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
>>> Opsys: [NONE] Arch: [NONE] Features: [NONE]
>>> Exec: '' ExecSize: 0 ImageSize: 0
>>> Dedicated Resources Per Task: PROCS: 1
>>> Utilized Resources Per Task: [NONE]
>>> Avg Util Resources Per Task: [NONE]
>>> Max Util Resources Per Task: [NONE]
>>> NodeAccess: SHARED
>>> NodeCount: 0
>>> Allocated Nodes:
>>> [snode02:1]
>>> Task Distribution: snode02
>>>
>>> IWD: [NONE] Executable: [NONE]
>>> Bypass: 0 StartCount: 1
>>> PartitionMask: [ALL]
>>> Flags: RESTARTABLE
>>> Reservation '33' (00:00:00 -> 1:00:00 Duration: 1:00:00)
>>> PE: 1.00 StartPriority: 1
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list