[Mauiusers] advance standing reservation
Bill Wichser
bill at Princeton.EDU
Mon Mar 27 14:08:27 MST 2006
So I've tried everything I thought that I knew and still I cannot make
this thing work. Jobs just defer forever. I am now the scheduler,
using qrun whenever I see a job in the quad queue waiting.
I've removed the quad queue entirely and then reentered it, hoping that
somewhere there was just some typo. I could sure use some help on this
one as I've just about scratched my head until it's bleeding!
Bill
Bill Wichser wrote:
> Environment
> -----------
> maui-3.2.6p13
> torque-1.1.0p6
> linux cluster
>
> I had this working. Or so I thought. But after a pbs-server reboot and
> a maui reboot, jobs just defer.
>
> I have two nodes with quad processors that I wish to allow only jobs
> specifying the #PBS -q quad can have access to and run.
>
> -------------------------------------------------
> In Torque:
> create queue quad
> set queue quad queue_type = Execution
> set queue quad acl_hosts = node076+node077
> set queue quad resources_max.nodect = 2
> set queue quad enabled = True
> set queue quad started = True
> ---------------------------------------------------
> In maui:
> SRCFG[quad] HOSTLIST=node076,node077
> SRCFG[quad] FLAGS=BYNAME
> SRCFG[quad] PERIOD=INFINITY
> SRCFG[quad] CLASSLIST=quad
>
> CLASSCFG[quad] PRIORITY=0
> CLASSCFG[quad] FLAGS=ADVRES:quad.0.0
> ------------------------------------------------------
>
> diagnose -r
>
> quad.0.0 User DEF -00:13:26 INFINITY INFINITY
> 2 2 8
> Flags: STANDINGRES BYNAME
> ACL: RES==quad.0= CLASS==quad+
> CL: RES==quad.0
> Task Resources: PROCS: [ALL]
> Attributes (HostList='node076 node077')
> Active PH: 0.00/1.79 (0.00%)
> SRAttributes (TaskCount: 0 StartTime: 00:00:00 EndTime:
> 1:00:00:00 Days: ALL)
> ---------------------------------------------------------------------------
>
> so the reservation is there and appears active. But when I do a
> "checknode node077" I see that in reservations there is something which
> doesn't seem correct.
>
> ----------------------------------------------------------------------------
>
> checking node node077
>
> State: Idle (in current state for 00:13:26)
> Configured Resources: PROCS: 4 MEM: 15G SWAP: 16G DISK: 1M
> Utilized Resources: [NONE]
> Dedicated Resources: [NONE]
> Opsys: DEFAULT Arch: linux
> Speed: 1.00 Load: 0.000
> Network: [DEFAULT]
> Features: [quad]
> Attributes: [Batch]
> Classes: [short 4:4][long 4:4][verylong 4:4][quad 4:4][default
> 4:4][single 4:4]
>
> Total Time: INFINITY Up: INFINITY (81.54%) Active: INFINITY
> (37.11%)
>
> Reservations:
> User 'quad.0.0'(x1) -00:13:26 -> INFINITY ( INFINITY)
> Blocked Resources at -00:13:26 Procs: 4/4 (100.00%)
> ------------------------------------------------------------------------
> That blocked resources line.
> So I submit a job specifying this quad queue and it immediately gets
> placed into a deferred state in the blocked list.
>
> --------------------------------------------------------------------------
> checking job 24640
>
> State: Idle EState: Deferred
> Creds: user:bill group:bill class:quad qos:DEFAULT
> WallTime: 00:00:00 of 1:12:00:00
> SubmitTime: Fri Mar 10 09:32:35
> (Time Queued Total: 3:08:51 Eligible: 00:05:27)
>
> Total Tasks: 4
>
> Req[0] TaskCount: 4 Partition: ALL
> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
> Opsys: [NONE] Arch: [NONE] Features: [NONE]
>
>
> IWD: [NONE] Executable: [NONE]
> Bypass: 0 StartCount: 0
> PartitionMask: [ALL]
> Flags: RESTARTABLE
>
> job is deferred. Reason: NoResources (cannot create reservation for
> job '24640' (intital reservation attempt)
> )
> Holds: Defer (hold reason: NoResources)
> PE: 4.00 StartPriority: 3087
> cannot select job 24640 for partition DEFAULT (job hold active)
> -----------------------------------------------------------------------
>
> And the Maui logs show:
>
> 03/10 12:46:06 INFO: node node077 can provide resources for job 24640:0
> 03/10 12:46:06 MLocalJobCheckNRes(24640,node077,2140000000)
> 03/10 12:46:06 INFO: 8 feasible tasks found for job 24640:0 in
> partition DEFAULT (4 Needed)
> 03/10 12:46:06
> MJobGetSNRange(24640,0,node076,(4 at 00:00:00),256,Affinity,Type,ARange,BRes)
> 03/10 12:46:06 INFO: attempting to get resources for 24640 4 * (P: 1
> M: 0 S: 0 D: 0)
> 03/10 12:46:06 MResCheckJAccess(24612,24640,129600,Same,Affinity)
> 03/10 12:46:06 MResCheckJAccess(quad.0.0,24640,129600,Same,Affinity)
> 03/10 12:46:06 MResCheckJAccess(24612,24640,129600,Same,Affinity)
> 03/10 12:46:06 MResCheckJAccess(quad.0.0,24640,129600,Same,Affinity)
> 03/10 12:46:06 INFO: ARange[0] too short for job 24640 (MR: 1 < W:
> 129600): removing range
> 03/10 12:46:06 INFO: node node076 unavailable for job 24640 at 00:00:00
> 03/10 12:46:06 INFO: no reservation time found for job 24640 on node
> node076 at 00:00:00
> 03/10 12:46:06
> MJobGetSNRange(24640,0,node077,(4 at 00:00:00),256,Affinity,Type,ARange,BRes)
> 03/10 12:46:06 INFO: attempting to get resources for 24640 4 * (P: 1
> M: 0 S: 0 D: 0)
> 03/10 12:46:06 MResCheckJAccess(quad.0.0,24640,129600,Same,Affinity)
> 03/10 12:46:06 MResCheckJAccess(quad.0.0,24640,129600,Same,Affinity)
> 03/10 12:46:06 INFO: ARange[0] too short for job 24640 (MR: 1 < W:
> 129600): removing range
> 03/10 12:46:06 INFO: node node077 unavailable for job 24640 at 00:00:00
> 03/10 12:46:06 INFO: no reservation time found for job 24640 on node
> node077 at 00:00:00
> 03/10 12:46:06 MJobSelectFRL(24640,G,1,RCount)
> 03/10 12:46:06 ALERT: job 24640 cannot run in any partition
> 03/10 12:46:06 ALERT: cannot create new reservation for job 24640
> (shape[1] 4)
> 03/10 12:46:06 ALERT: cannot create new reservation for job 24640
> 03/10 12:46:06 MJobSetHold(24640,16,00:05:00,NoResources,cannot create
> reservation for job '24640' (intital reservation attempt)
> 03/10 12:46:06 ALERT: job '24640' cannot run (deferring job for 300
> seconds)
> ----------------------------------------------------------------------------
>
>
> I must be missing something here but I've reread the documentation and
> find nothing. I'm not sure how to further debug. Can anyone provide me
> with a further clue as to what might be missing?
>
> Thanks,
> Bill
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
More information about the mauiusers
mailing list