[Mauiusers] Multi-req job not starting

Kunal Rao kunalgrao at gmail.com
Wed May 23 12:30:38 MDT 2012


There was a similar post earlier :
http://www.clusterresources.com/pipermail/mauiusers/2009-July/003930.html

But did not find any response to it. Can anyone please provide some ideas /
suggestion on this issue.

Thanks,
Kunal

On Wed, May 23, 2012 at 2:26 PM, Kunal Rao <kunalgrao at gmail.com> wrote:

> Hello,
>
> I have a 10 node cluster. There are 3 jobs. 1 which needs 2 nodes ( with 1
> task per node ), another which needs 4 nodes (with 1 task per node) and the
> third one which needs 4 nodes (  with 2 task on 1 node and 1 task each on
> the other 3 nodes ).
>
> Additional configuration in maui.cfg is :
>
> BACKFILLPOLICY        FIRSTFIT
> RESERVATIONPOLICY     CURRENTHIGHEST
>
> ENABLEMULTIREQJOBS TRUE
> NODEALLOCATIONPOLICY  MINRESOURCE
> NODEACCESSPOLICY SINGLEJOB
> JOBNODEMATCHPOLICY EXACTNODE
>
> I am observing that if the first 2 jobs are running, the third one does
> not start ( even though 4 nodes are available ) until 1 of the jobs
> complete. With checkjob -v <job_id> it shows the following output :
>
> ------------------
>
> checking job 5791 (RM job '5791.fire16.csa.local')
>
> State: Idle
> Creds:  user:kunal  group:kunal  class:batch  qos:DEFAULT
> WallTime: 00:00:00 of 00:04:51
> SubmitTime: Wed May 23 11:52:04
>   (Time Queued  Total: 00:48:52  Eligible: 00:48:52)
>
> StartDate: 00:00:01  Wed May 23 12:40:57
> Total Tasks: 2
>
> Req[0]  TaskCount: 2  Partition: ALL
> Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
> Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
> Exec:  ''  ExecSize: 0  ImageSize: 0
> Dedicated Resources Per Task: PROCS: 1
> NodeAccess: SINGLEJOB
> TasksPerNode: 2  NodeCount: 1
>
> Req[1]  TaskCount: 3  Partition: ALL
> Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
> Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
> Exec:  ''  ExecSize: 0  ImageSize: 0
> Dedicated Resources Per Task: PROCS: 1
> NodeAccess: SINGLEJOB
> NodeCount: 3
>
>
> IWD: [NONE]  Executable:  [NONE]
> Bypass: 5  StartCount: 0
> PartitionMask: [ALL]
> Flags:       RESTARTABLE
>
> Reservation '5791' (00:00:01 -> 00:04:52  Duration: 00:04:51)
> PE:  5.00  StartPriority:  48
> cannot select job 5791 for partition DEFAULT (startdate in '00:00:01')
>
> ------------
>
> What could be the reason for not starting this job ? How do I resolve this
> ?
>
> Thanks,
> Kunal
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20120523/659f8e9c/attachment-0001.html 


More information about the mauiusers mailing list