[Mauiusers] Odd behavior with suspended jobs

Nate Crawford nathan.crawford at chemie.uni-karlsruhe.de
Fri Nov 11 09:01:55 MST 2005


   We are running maui-3.2.6p14-snap.1127934075 and torque-1.2.0p6 on a
19 node opteron cluster (SuSE 9.3), and have run into problems with
preemptible (policy suspend) jobs.  The first is that the suspended
job's calculated queuetime gets set to the total time on suspension,
causing the QTIME priority to explode.  Output from checkjob from
before, during, and after suspension are displayed below:  

------------------------
checking job 2984

State: Running
Creds:  user:nate  group:ck  class:prefinity  qos:low
WallTime: 2:17:51:13 of   INFINITY
Suspended Wall Time: 6:37:14
SubmitTime: Tue Nov  8 10:22:42
  (Time Queued  Total: 00:00:01  Eligible: 7:38:21)
[snip]
Bypass: 0  StartCount: 1
PartitionMask: [ALL]
Flags:       BACKFILL PREEMPTEE
Attr:        PREEMPTEE

Reservation '2984' (-3:00:36:22 ->   INFINITY  Duration:   INFINITY)
PE:  1.00  StartPriority:  3350

---------------------------------------------------------------------
checking job 2984

State: Suspended
Creds:  user:nate  group:ck  class:prefinity  qos:low
WallTime: 2:17:55:16 of   INFINITY
Suspended Wall Time: 8:55:25
SubmitTime: Tue Nov  8 10:22:42
  (Time Queued  Total: 3:03:00:47  Eligible: 3:03:00:47)
[snip]
Bypass: 0  StartCount: 1
PartitionMask: [ALL]
Flags:       BACKFILL PREEMPTEE
Attr:        PREEMPTEE

PE:  1.00  StartPriority:  7391

--------------------------------------------------------------------

checking job 2984

State: Running
Creds:  user:nate  group:ck  class:prefinity  qos:low
WallTime: 2:18:05:54 of   INFINITY
Suspended Wall Time: 10:06:39
SubmitTime: Tue Nov  8 10:22:42
  (Time Queued  Total: 00:00:01  Eligible: 3:04:13:01)
[snip]
Bypass: 0  StartCount: 1
PartitionMask: [ALL]
Flags:       BACKFILL PREEMPTEE
Attr:        PREEMPTEE

Reservation '2984' (-3:04:23:29 ->   INFINITY  Duration:   INFINITY)
PE:  1.00  StartPriority:  7462

-------------------------------------------------------------------------


This tends to cause long-running preemptible jobs to be suspended only
once before getting extremely high priority.  It would be better to have
the queued time represent the time actually waiting to run.  A
showconfig | grep QUEUE gives:

JOBPRIOACCRUALPOLICY            QUEUEPOLICY
USESYSTEMQUEUETIME              TRUE
QUEUETIMEWEIGHT[0]                1
TARGETQUEUETIMEWEIGHT[0]          0
JOBPRIOACCRUALPOLICY            QUEUEPOLICY

Am I missing something obvious?


  The second problem manifests as a preemptible job running on top of an
already suspended job, in violation of preemption policy.  The most
relevant parts of maui.cfg:

--------------------------------------------------------------------------

BACKFILLPOLICY        FIRSTFIT
RESERVATIONPOLICY     CURRENTHIGHEST

NODEALLOCATIONPOLICY   LASTAVAILABLE
NODEACCESSPOLICY       SHARED
NODEAVAILABILITYPOLICY DEDICATED:PROC

PREEMPTPOLICY             SUSPEND
PREEMPTPOLICY[DEFAULT]    SUSPEND

QOSCFG[high]  QFLAGS=PREEMPTOR
QOSCFG[medium]
QOSCFG[low] QFLAGS=PREEMPTEE

SRCFG[dayjobs] STARTTIME=8:00:00 ENDTIME=18:00:00
SRCFG[dayjobs] PERIOD=DAY DAYS=MON,TUE,WED,THU,FRI DEPTH=7
SRCFG[dayjobs] FLAGS=SPACEFLEX
SRCFG[dayjobs] CLASSLIST=short,medium,prefinity
SRCFG[dayjobs] TASKCOUNT=4 RESOURCES=PROCS:1
SRCFG[dayjobs] ACCESS=DEDICATED

SRCFG[quickjobs] STARTTIME=8:00:00 ENDTIME=18:00:00
SRCFG[quickjobs] PERIOD=DAY DAYS=MON,TUE,WED,THU,FRI DEPTH=7
SRCFG[quickjobs] FLAGS=SPACEFLEX
SRCFG[quickjobs] CLASSLIST=short,prefinity
SRCFG[quickjobs] TASKCOUNT=2 RESOURCES=PROCS:1
SRCFG[quickjobs] ACCESS=DEDICATED

CLASSCFG[infinity]      WCOVERRUN=12:00:00  QDEF=medium
CLASSCFG[verylong]      QDEF=medium
CLASSCFG[long]          QDEF=medium
CLASSCFG[medium]        WCOVERRUN=00:30:00      QDEF=high
CLASSCFG[short]         WCOVERRUN=00:05:00     QDEF=high
CLASSCFG[prefinity]     WCOVERRUN=01:00:00 MAXPROCPERJOB=1 QDEF=low

---------------------------------------------------------------------------

  The setup is designed to reserve a few processors for quick jobs, but
to also let long, preemptible jobs run in these slots.  Jobs in the
short and medium queues (QFLAGS=PREEMPTOR) successfully suspend
prefinity-queue jobs (QFLAGS=PREEMPTEE), which then resume properly
after the shorter job exits.  However, if there is another prefinity job
waiting, even one with lower priority, the original job will stay
suspended while the new job runs.  Qdel-ing the usurper job does allow
the old job to resume with no other problem.  

  I looked through the maui logs but could find nothing definitive.  I
do have the logs saved, though, as well as the output from diagnose,
etc. from one of these events, which are reproducible.  We have seen
similar problems on our other clusters running various versions of
Maui/Torque, which isn't all that surprising as their configurations are
very similar.  

  Again, I have probably missed something in the documentation, but any
pointers would be appreciated.

Thanks,
Nate


-- 
__________________________________
Dr. Nathan Crawford
Theoretische Chemie
Universität Karlsruhe

nathan.crawford at chemie.uni-karlsruhe.de



More information about the mauiusers mailing list