[torqueusers] only one processor is used when using qsub -l procs flag

Gustavo Correa gus at ldeo.columbia.edu
Mon Jan 16 08:50:09 MST 2012


PS - Hi Xiangqian.   

Maybe you need to add this line to your maui.cfg [and restart maui],
for the 'proc=Z' syntax to work as you expect:

JOBNODEMATCHPOLICY EXACTNODE

I *think* the default is 

JOBNODEMATCHPOLICY EXACTPROC

which expects your node to have the exact number of processors you requested [i.e. 3].

See appendix F of the Maui Admininstrator Guide for details.

I am not sure, but my recollection is that somebody reported a problem similar to yours
in the list before, and the solution suggested was this one.

I hope this helps,
Gus Correa

On Jan 16, 2012, at 10:21 AM, Gustavo Correa wrote:

> Hi Xiangqian
> 
> For what it is worth, I use Maui 3.2.6p21, and I don't have the problem you described.
> I don't know the behavior in Maui 3.3.1, but as you reported 3.2.6p1 also works correctly for you,
> with the nodes-1:ppn=3 syntax.
> I am happy with 3.2.6p21.
> 
> There is still a chance that a change in maui.cfg 3.3.1 may fix this glitch, 
> but I don't know what it would be.  Most likely it has to do with the node allocation policies,
> and how it translates 'procs' into nodes and ppn.
> Somebody else more savvy in the list may clarify this point.
> 
> I confess I prefer the more detailed syntax 'nodes=X:ppn=Y', 
> because it specifies more detail about the resources you are requesting, 
> and apparently avoids the issue that hit you.
> 
> Have you tried the 'nodes=1:ppn=3' syntax in Maui 3.3.1? 
> I wonder if it would work there too.
> 
> I hope this helps,
> Gus Correa
> 
> 
> On Jan 16, 2012, at 1:43 AM, Xiangqian Wang wrote:
> 
>> thanks, Gustavo
>> 
>> sorry for the misspelling in the previous email, i recheck it and correct it as following:
>> 
>> i tested torque 2.5.8 and maui 3.3.1 on a centos 6.0 node, the job script is:
>> 
>> #!/bin/sh
>> #PBS -N procsjob
>> #PBS -l procs=3
>> #PBS -q batch
>> ping localhost -c 100
>> 
>> and qstat output "exec_host = snode02/0".
>> i replace with the new job script, as 
>> 
>> #!/bin/sh
>> #PBS -N procsjob
>> #PBS -l nodes=1:ppn=3
>> #PBS -q batch
>> ping localhost -c 100
>> and qstat output "exec_host = snode02/2+snode02/1+snode02/0".
>> 
>> i change maui 3.3.1 to maui 3.2.6p21 and test again, qstat output  "exec_host = snode02/2+snode02/1+snode02/0" for both script. maybe it's a maui 3.3.1 problem?
>> 
>> 
>> 2012/1/14 Gustavo Correa <gus at ldeo.columbia.edu>
>> Hi Xiangqian
>> 
>> Is it a typo in your email or did you comment out this line in your Torque/PBS script?
>> [Note the double hash ##.]
>> 
>>> ##PBS -l procs=3
>> 
>> Have you tried this form instead?
>> 
>> #PBS -l nodes=1:ppn=3
>> 
>> For more details check 'man qsub' and 'man pbs_resources'.
>> 
>> I hope it helps,
>> Gus Correa
>> 
>> On Jan 13, 2012, at 4:10 AM, Xiangqian Wang wrote:
>> 
>>> my demo torque+maui cluster has one node with np=4 set fot it. i want to submit a job requesting 3 processors, but when it start to run, i see only one processor is used (qstat shows "exec_host = snode02/0").
>>> 
>>> i use torque 2.5.6 and maui 3.3.1. anyone can help me out, it'll be greatly appreciated
>>> 
>>> the submit script is something like:
>>> 
>>> #!/bin/sh
>>> #PBS -N procsjob
>>> ##PBS -l procs=3
>>> #PBS -q batch
>>> the output of checkjob is :
>>> 
>>> checking job 33
>>> State: Running
>>> Creds:  user:wangxq  group:wangxq  class:batch  qos:DEFAULT
>>> WallTime: 00:00:00 of 1:00:00
>>> SubmitTime: Fri Jan 13 17:07:43
>>>  (Time Queued  Total: 00:00:01  Eligible: 00:00:01)
>>> StartTime: Fri Jan 13 17:07:44
>>> Total Tasks: 1
>>> Req[0]  TaskCount: 1  Partition: DEFAULT
>>> Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
>>> Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
>>> Exec:  ''  ExecSize: 0  ImageSize: 0
>>> Dedicated Resources Per Task: PROCS: 1
>>> Utilized Resources Per Task:  [NONE]
>>> Avg Util Resources Per Task:  [NONE]
>>> Max Util Resources Per Task:  [NONE]
>>> NodeAccess: SHARED
>>> NodeCount: 0
>>> Allocated Nodes:
>>> [snode02:1]
>>> Task Distribution: snode02
>>> 
>>> IWD: [NONE]  Executable:  [NONE]
>>> Bypass: 0  StartCount: 1
>>> PartitionMask: [ALL]
>>> Flags:       RESTARTABLE
>>> Reservation '33' (00:00:00 -> 1:00:00  Duration: 1:00:00)
>>> PE:  1.00  StartPriority:  1
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>> 
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>> 
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list