[Mauiusers] advance standing reservation

Bill Wichser bill at Princeton.EDU
Fri Mar 10 10:48:27 MST 2006


Environment
-----------
maui-3.2.6p13
torque-1.1.0p6
linux cluster

I had this working.  Or so I thought.  But after a pbs-server reboot and 
a maui reboot, jobs just defer.

I have two nodes with quad processors that I wish to allow only jobs 
specifying the #PBS -q quad can have access to and run.

-------------------------------------------------
In Torque:
create queue quad
set queue quad queue_type = Execution
set queue quad acl_hosts = node076+node077
set queue quad resources_max.nodect = 2
set queue quad enabled = True
set queue quad started = True
---------------------------------------------------
In maui:
SRCFG[quad] HOSTLIST=node076,node077
SRCFG[quad] FLAGS=BYNAME
SRCFG[quad] PERIOD=INFINITY
SRCFG[quad] CLASSLIST=quad

CLASSCFG[quad]          PRIORITY=0
CLASSCFG[quad]          FLAGS=ADVRES:quad.0.0
------------------------------------------------------

diagnose -r

quad.0.0                   User DEF   -00:13:26    INFINITY     INFINITY 
    2    2    8
     Flags: STANDINGRES BYNAME
     ACL: RES==quad.0= CLASS==quad+
     CL:  RES==quad.0
     Task Resources: PROCS: [ALL]
     Attributes (HostList='node076 node077')
     Active PH: 0.00/1.79 (0.00%)
     SRAttributes (TaskCount: 0  StartTime: 00:00:00  EndTime: 
1:00:00:00  Days: ALL)
---------------------------------------------------------------------------

so the reservation is there and appears active.  But when I do a 
"checknode node077" I see that in reservations there is something which 
doesn't seem correct.

----------------------------------------------------------------------------
checking node node077

State:      Idle  (in current state for 00:13:26)
Configured Resources: PROCS: 4  MEM: 15G  SWAP: 16G  DISK: 1M
Utilized   Resources: [NONE]
Dedicated  Resources: [NONE]
Opsys:       DEFAULT  Arch:       linux
Speed:      1.00  Load:       0.000
Network:    [DEFAULT]
Features:   [quad]
Attributes: [Batch]
Classes:    [short 4:4][long 4:4][verylong 4:4][quad 4:4][default 
4:4][single 4:4]

Total Time:   INFINITY  Up:   INFINITY (81.54%)  Active:   INFINITY (37.11%)

Reservations:
   User 'quad.0.0'(x1)  -00:13:26 ->   INFINITY (  INFINITY)
     Blocked Resources at -00:13:26   Procs: 4/4 (100.00%)
------------------------------------------------------------------------
That blocked resources line.
So I submit a job specifying this quad queue and it immediately gets 
placed into a deferred state in the blocked list.

--------------------------------------------------------------------------
checking job 24640

State: Idle  EState: Deferred
Creds:  user:bill  group:bill  class:quad  qos:DEFAULT
WallTime: 00:00:00 of 1:12:00:00
SubmitTime: Fri Mar 10 09:32:35
   (Time Queued  Total: 3:08:51  Eligible: 00:05:27)

Total Tasks: 4

Req[0]  TaskCount: 4  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 0
PartitionMask: [ALL]
Flags:       RESTARTABLE

job is deferred.  Reason:  NoResources  (cannot create reservation for 
job '24640' (intital reservation attempt)
)
Holds:    Defer  (hold reason:  NoResources)
PE:  4.00  StartPriority:  3087
cannot select job 24640 for partition DEFAULT (job hold active)
-----------------------------------------------------------------------

And the Maui logs show:

03/10 12:46:06 INFO:     node node077 can provide resources for job 24640:0
03/10 12:46:06 MLocalJobCheckNRes(24640,node077,2140000000)
03/10 12:46:06 INFO:     8 feasible tasks found for job 24640:0 in 
partition DEFAULT (4 Needed)
03/10 12:46:06 
MJobGetSNRange(24640,0,node076,(4 at 00:00:00),256,Affinity,Type,ARange,BRes)
03/10 12:46:06 INFO:     attempting to get resources for 24640 4 * (P: 1 
  M: 0  S: 0  D: 0)
03/10 12:46:06 MResCheckJAccess(24612,24640,129600,Same,Affinity)
03/10 12:46:06 MResCheckJAccess(quad.0.0,24640,129600,Same,Affinity)
03/10 12:46:06 MResCheckJAccess(24612,24640,129600,Same,Affinity)
03/10 12:46:06 MResCheckJAccess(quad.0.0,24640,129600,Same,Affinity)
03/10 12:46:06 INFO:     ARange[0] too short for job 24640 (MR: 1 < W: 
129600):  removing range
03/10 12:46:06 INFO:     node node076 unavailable for job 24640 at 00:00:00
03/10 12:46:06 INFO:     no reservation time found for job 24640 on node 
node076 at 00:00:00
03/10 12:46:06 
MJobGetSNRange(24640,0,node077,(4 at 00:00:00),256,Affinity,Type,ARange,BRes)
03/10 12:46:06 INFO:     attempting to get resources for 24640 4 * (P: 1 
  M: 0  S: 0  D: 0)
03/10 12:46:06 MResCheckJAccess(quad.0.0,24640,129600,Same,Affinity)
03/10 12:46:06 MResCheckJAccess(quad.0.0,24640,129600,Same,Affinity)
03/10 12:46:06 INFO:     ARange[0] too short for job 24640 (MR: 1 < W: 
129600):  removing range
03/10 12:46:06 INFO:     node node077 unavailable for job 24640 at 00:00:00
03/10 12:46:06 INFO:     no reservation time found for job 24640 on node 
node077 at 00:00:00
03/10 12:46:06 MJobSelectFRL(24640,G,1,RCount)
03/10 12:46:06 ALERT:    job 24640 cannot run in any partition
03/10 12:46:06 ALERT:    cannot create new reservation for job 24640 
(shape[1] 4)
03/10 12:46:06 ALERT:    cannot create new reservation for job 24640
03/10 12:46:06 MJobSetHold(24640,16,00:05:00,NoResources,cannot create 
reservation for job '24640' (intital reservation attempt)
03/10 12:46:06 ALERT:    job '24640' cannot run (deferring job for 300 
seconds)
----------------------------------------------------------------------------

I must be missing something here but I've reread the documentation and 
find nothing.  I'm not sure how to further debug.  Can anyone provide me 
with a further clue as to what might be missing?

Thanks,
Bill



More information about the mauiusers mailing list