[Mauiusers] Preemption

Gerson Galang gerson.sapac at gawab.com
Sat Aug 28 01:41:18 MDT 2004


Hi,

I've been trying to monitor how maui preempts job and here's one thing I 
don't understand, I don't know why maui preempts jobs which have the 
same or higher priority compared to the new job (in the queue) that it 
will run next. Maui might have a complicated way doing priority 
calculation but I don't think the configuration I have is complicated 
enough for maui not to do what I was expecting it to do.

Here's my maui config:

RMCFG[base]    SUSPENDSIG=suspend
PREEMPTPOLICY  SUSPEND

QUEUETIMEWEIGHT    1
CREDWEIGHT         1
QOSWEIGHT          1

CLASSCFG[parallel] QDEF=high
CLASSCFG[batch]    QDEF=low

SRCFG[normal]      PERIOD=INFINITY
SRCFG[normal]      OWNER=QOS:low
SRCFG[normal]      TASKCOUNT=3
SRCFG[normal]      CLASSLIST=parallel,batch
SRCFG[normal]      PRIORITY=10

SRCFG[para]        PERIOD=INFINITY
SRCFG[para]        OWNER=QOS:high
SRCFG[para]        TASKCOUNT=3
SRCFG[para]        FLAGS=OWNERPREEMPT
SRCFG[para]        CLASSLIST=parallel,batch
SRCFG[para]        PRIORITY=20

QOSCFG[high]       PRIORITY=20
QOSCFG[high]       QFLAGS=PREEMPTOR
QOSCFG[low]        PRIORITY=10
QOSCFG[low]        QFLAGS=PREEMPTEE

And here's the snap shot of what's happening with the jobs in my queue. 
As you can see, jobs 783 and 784 just suddenly got suspended and jobs 
786, 788, and 789 were run when the first two jobs have higher 
priorities than jobs 786, 788, and 789. diagnose -p might not show what 
the priorities of the current running jobs are (783 and 784) but I'm 
pretty sure that they have a higher priority because they were first in 
the queue and all of them have just been submitted to the same queue 
(batch) which will give them the same starting priority when they are 
submitted.

diagnosing job priority information (partition: ALL)

Job                    PRIORITY*   Cred(  QOS)  Serv(QTime)
              Weights   --------       1(    1)     1(    1)

786                          23    44.0( 10.0)  56.0( 12.8)
788                          22    44.4( 10.0)  55.6( 12.5)
789                          22    44.5( 10.0)  55.5( 12.5)
790                          22    44.6( 10.0)  55.4( 12.4)
791                          22    44.6( 10.0)  55.4( 12.4)
792                          22    44.6( 10.0)  55.4( 12.4)
795                          21    95.2( 20.0)   4.8(  1.0)
794                          20    49.8( 10.0)  50.2( 10.1)

Percent Contribution   --------    51.1( 51.1)  48.9( 48.9)

* indicates system prio set on job

dev.sapac.edu.au:
                                                             Req'd Req'd 
   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time 
  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- 
- -----
783.dev.sapac.e globus   batch    mpitest-1-  31993   1  --    --    -- 
  R 00:01
    dev2/1+dev2/0
784.dev.sapac.e test02   batch    mpitest-1-  27568   1  --    --    -- 
  R 00:01
    dev1/1+dev1/0
785.dev.sapac.e test02   batch    mpitest-1-  15157   1  --    --    -- 
  R 00:01
    dev3/1+dev3/0
786.dev.sapac.e test02   batch    mpitest-1-    --    1  --    --    -- 
  Q   --
     --
788.dev.sapac.e test02   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
789.dev.sapac.e test02   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
790.dev.sapac.e gerson   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
791.dev.sapac.e gerson   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
792.dev.sapac.e gerson   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
794.dev.sapac.e globus   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
795.dev.sapac.e globus   parallel mpitest-2-    --    2  --    --    -- 
  Q   --
     --
diagnosing job priority information (partition: ALL)

Job                    PRIORITY*   Cred(  QOS)  Serv(QTime)
              Weights   --------       1(    1)     1(    1)

786                          23    43.0( 10.0)  57.0( 13.3)
788                          23    43.4( 10.0)  56.6( 13.0)
789                          23    43.5( 10.0)  56.5( 13.0)
790                          23    43.6( 10.0)  56.4( 12.9)
791                          23    43.6( 10.0)  56.4( 12.9)
792                          23    43.6( 10.0)  56.4( 12.9)
795                          22    92.9( 20.0)   7.1(  1.5)
794                          21    48.6( 10.0)  51.4( 10.6)

Percent Contribution   --------    49.9( 49.9)  50.1( 50.1)

* indicates system prio set on job

dev.sapac.edu.au:
                                                             Req'd Req'd 
   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time 
  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- 
- -----
783.dev.sapac.e globus   batch    mpitest-1-  31993   1  --    --    -- 
  S 00:01
    dev2/1+dev2/0
784.dev.sapac.e test02   batch    mpitest-1-  27568   1  --    --    -- 
  S 00:01
    dev1/1+dev1/0
785.dev.sapac.e test02   batch    mpitest-1-  15157   1  --    --    -- 
  R 00:01
    dev3/1+dev3/0
786.dev.sapac.e test02   batch    mpitest-1-    --    1  --    --    -- 
  Q   --
     --
788.dev.sapac.e test02   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
789.dev.sapac.e test02   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
790.dev.sapac.e gerson   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
791.dev.sapac.e gerson   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
792.dev.sapac.e gerson   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
794.dev.sapac.e globus   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
795.dev.sapac.e globus   parallel mpitest-2-    --    2  --    --    -- 
  Q   --
     --
diagnosing job priority information (partition: ALL)

Job                    PRIORITY*   Cred(  QOS)  Serv(QTime)
              Weights   --------       1(    1)     1(    1)

790                          23    42.7( 10.0)  57.3( 13.4)
791                          23    42.7( 10.0)  57.3( 13.4)
792                          23    42.7( 10.0)  57.3( 13.4)
795                          22    90.8( 20.0)   9.2(  2.0)
794                          21    47.5( 10.0)  52.5( 11.1)

Percent Contribution   --------    52.9( 52.9)  47.1( 47.1)

* indicates system prio set on job

Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
783.dev            mpitest-1-10000  globus           00:01:26 S batch
784.dev            mpitest-1-10000  test02           00:01:34 S batch
785.dev            mpitest-1-10000  test02           00:01:30 R batch
786.dev            mpitest-1-10000  test02           00:00:10 R batch
788.dev            mpitest_0.5_5000 test02                  0 R batch
789.dev            mpitest_0.5_5000 test02                  0 R batch
790.dev            mpitest_0.5_5000 gerson                  0 Q batch
791.dev            mpitest_0.5_5000 gerson                  0 Q batch
792.dev            mpitest_0.5_5000 gerson                  0 Q batch
794.dev            mpitest_0.5_5000 globus                  0 Q batch
795.dev            mpitest-2-12500  globus                  0 Q parallel

dev.sapac.edu.au:
                                                             Req'd Req'd 
   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time 
  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- 
- -----
783.dev.sapac.e globus   batch    mpitest-1-  31993   1  --    --    -- 
  S 00:01
    dev2/1+dev2/0
784.dev.sapac.e test02   batch    mpitest-1-  27568   1  --    --    -- 
  S 00:01
    dev1/1+dev1/0
785.dev.sapac.e test02   batch    mpitest-1-  15157   1  --    --    -- 
  R 00:01
    dev3/1+dev3/0
786.dev.sapac.e test02   batch    mpitest-1-  32074   1  --    --    -- 
  R 00:00
    dev2/1+dev2/0
788.dev.sapac.e test02   batch    mpitest_0.  27649   1  --    --    -- 
  R   --
    dev1/0
789.dev.sapac.e test02   batch    mpitest_0.  27656   1  --    --    -- 
  R   --
    dev1/1
790.dev.sapac.e gerson   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
791.dev.sapac.e gerson   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
792.dev.sapac.e gerson   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
794.dev.sapac.e globus   batch    mpitest_0.    --    1  --    --    -- 
  Q   --
     --
795.dev.sapac.e globus   parallel mpitest-2-    --    2  --    --    -- 
  Q   --
     --

I don't have a problem with this because I know that later on, maui will 
still resume the preempted jobs. The only main concern we have here is 
that if we cannot control maui in preempting which jobs we want it to 
suspend or resume, we might run out of swap space which will be 
allocated for the suspended job/s.

Is there anything wrong with my configuration or should I assume that 
this behavior is just normal to maui?

Regards,
Gerson







More information about the mauiusers mailing list