[Mauiusers] Multiple job request peculiarities

Marvin Novaglobal marvin.novaglobal at gmail.com
Thu Mar 24 20:55:49 MDT 2011


Sorry, I just had a look at my original post again. The description missed a
'+' sign there but in my actual testing I have a '+' sign. Therefore,
qsub -l nodes=1:ppn=12+1:ppn=1 (works)
while
qsub -l nodes=3:ppn=12+1:ppn=1 (does not work, job goes to idle)

Weird stuff. May I know if you guys encounter this?


Regards,
Marvin


On Fri, Mar 25, 2011 at 10:46 AM, Marvin Novaglobal <
marvin.novaglobal at gmail.com> wrote:

> Hi Peter,
>     It doesn't work for my setup. I meant it only applies to nodes=3 and
> nodes=5 so far. We don't have enough resources to test on nodes=7. So again,
> qsub -l nodes=1:ppn=12+1:ppn=1 will work but
> qsub -l nodes=3:ppn=12+1:ppn=1 will not work
>     May I know which version of Maui and Torque you are using? Your Maui
> and Torque's config also please.
>
>
>
> Regards,
> Marvin
>
>
> On Fri, Mar 25, 2011 at 12:20 AM, Peter Michael Crosta <
> pmc2107 at columbia.edu> wrote:
>
>> Hi Marvin,
>>
>> I have gotten multiple resource requests to work by using the "+" sign.
>> Have you tried
>>
>> qsub -l nodes=3:ppn=12+1:ppn=1 ?
>>
>> Best,
>> Peter
>>
>>
>> On Thu, 24 Mar 2011, Marvin Novaglobal wrote:
>>
>>  Hi,    On my setup,
>>> $ qsub -l nodes=1:ppn=12:1:ppn=1 (works)
>>> $ qsub -l nodes=2:ppn=12:1:ppn=1 (works)
>>> $ qsub -l nodes=3:ppn=12:1:ppn=1 (job goes to idle and never get
>>> executed)
>>> $ qsub -l nodes=4:ppn=12:1:ppn=1 (works)
>>> $ qsub -l nodes=5:ppn=12:1:ppn=1 (job goes to idle and never get
>>> executed)
>>>
>>> <Maui.cfg>
>>> ...
>>> ENABLEMULTINODEJOBS[0]            TRUE
>>> ENABLEMULTIREQJOBS[0]              TRUE
>>> JOBNODEMATCHPOLICY[0]             EXACTNODE
>>> NODEALLOCATIONPOLICY[0]           MINRESOURCE
>>>
>>>
>>> <Torque.cfg>
>>> set server scheduling = True
>>> set server acl_hosts = aquarius.local
>>> set server managers = torque at aquarius
>>> set server operators = torque at aquarius
>>> set server default_queue = DEFAULT
>>> set server log_events = 511
>>> set server mail_from = adm
>>> set server resources_available.nodect = 2048
>>> set server scheduler_iteration = 600
>>> set server node_check_rate = 150
>>> set server tcp_timeout = 6
>>> set server mom_job_sync = True
>>> set server keep_completed = 300
>>> set server next_job_number = 377
>>>
>>> <maui.log>
>>> 03/24 20:23:48 MResDestroy(377)
>>> 03/24 20:23:48 MResChargeAllocation(377,2)
>>> 03/24 20:23:48
>>> MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
>>> 03/24 20:23:48 INFO:     total jobs selected in partition ALL: 1/1
>>> 03/24 20:23:48
>>>
>>> MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE)
>>> 03/24 20:23:48 INFO:     total jobs selected in partition DEFAULT: 1/1
>>> 03/24 20:23:48 MQueueScheduleIJobs(Q,DEFAULT)
>>> 03/24 20:23:48 INFO:     72 feasible tasks found for job 377:0 in
>>> partition
>>> DEFAULT (36 Needed)
>>> 03/24 20:23:48 INFO:     72 feasible tasks found for job 377:1 in
>>> partition
>>> DEFAULT (1 Needed)
>>> 03/24 20:23:48 ALERT:    inadequate tasks to allocate to job 377:1 (0 <
>>> 1)
>>> 03/24 20:23:48 ERROR:    cannot allocate nodes to job '377' in partition
>>> DEFAULT
>>> 03/24 20:23:48 MJobPReserve(377,DEFAULT,ResCount,ResCountRej)
>>> 03/24 20:23:48 MJobReserve(377,Priority)
>>> 03/24 20:23:48 INFO:     72 feasible tasks found for job 377:0 in
>>> partition
>>> DEFAULT (36 Needed)
>>> 03/24 20:23:48 INFO:     72 feasible tasks found for job 377:1 in
>>> partition
>>> DEFAULT (1 Needed)
>>> 03/24 20:23:48 INFO:     72 feasible tasks found for job 377:0 in
>>> partition
>>> DEFAULT (36 Needed)
>>> 03/24 20:23:48 INFO:     72 feasible tasks found for job 377:1 in
>>> partition
>>> DEFAULT (1 Needed)
>>> 03/24 20:23:48 INFO:     located resources for 36 tasks (144) in best
>>> partition DEFAULT for job 377 at time 00:00:01
>>> 03/24 20:23:48 INFO:     tasks located for job 377:  37 of 36 required
>>> (144
>>> feasible)
>>> 03/24 20:23:48 MResJCreate(377,MNodeList,00:00:01,Priority,Res)
>>> 03/24 20:23:48 INFO:     job '377' reserved 36 tasks (partition DEFAULT)
>>> to
>>> start in 00:00:01 on Thu Mar 24 20:23:49
>>>  (WC: 2592000)
>>>
>>> <pbs_server.log>
>>> 03/24/2011 20:23:17;0100;PBS_Server;Job;377.aquarius;enqueuing into
>>> DEFAULT,
>>> state 1 hop 1
>>> 03/24/2011 20:23:17;0008;PBS_Server;Job;377.aquarius;Job Queued at
>>> request
>>> of torque at aquarius, owner = torque at aquarius, job name = parallel.sh,
>>> queue =
>>> DEFAULT
>>> 03/24/2011 20:23:17;0040;PBS_Server;Svr;aquarius;Scheduler was sent the
>>> command new
>>>
>>>
>>> Anyone encounter problem with multiple job requests?
>>>
>>>
>>> Regards,
>>> Marvin
>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20110325/9c016ec0/attachment.html 


More information about the mauiusers mailing list