[Mauiusers] Preempted (suspended) job not restarting when it
should.
James Wigdahl
james at wigdahl.com
Fri Mar 31 14:50:41 MST 2006
On Mar 15, 2006, at 8:55 AM, David Corredor wrote:
> I'm trying to setup some basic preemtption with a "suspend"
> policy whithin
> Maui. The preemption part is working, except that the job that gets
> preempted (suspended) doesn't restart execution until after all
> other jobs
> in the Idle queue are finished executing, even if those jobs
> don't have the
> preemtor flag set, and as far as I can tell, those jobs don't
> have a higher
> priority nor xfactor than the suspended job either.
>
> By looking at the logs, it seems to me that while the first job was
> suspended, and the preemptor was running, the next idle job in the
> queue
> (with same prioriy as the suspended one), was reserved the node next
> somehow, and so when the suspended job is supposed to restart, it
> doesn't
> find an available node.
>
> I would appreciate any hints in this regard.
I've been suffering with the same issue and was led to believe that
adding the following to my config would fix things:
FSPOLICY UTILIZEDPS
CONSUMEDWEIGHT 3
However, have not found this to resolve anything. Here is some live
output from 'diagnose -p' which I've edited only showing suspended
jobs:
# ./diagnose -p
diagnosing job priority information (partition: ALL)
Job PRIORITY* Cred( QOS:Class) Serv(QTime)
Targ(QTime) Res(Cons )
Weights -------- 5( 2: 8) 1
( 1) 1( 1) 1( 3)
8469 3547 19.7( 10.0: 15.0) 80.3(2847.)
0.0( 0.0) 0.0( 0.0)
8968 2073 43.4( 10.0: 20.0) 56.6(1173.)
0.0( 0.0) 0.0( 0.0)
8969 2073 43.4( 10.0: 20.0) 56.6(1173.)
0.0( 0.0) 0.0( 0.0)
8970 2073 43.4( 10.0: 20.0) 56.6(1173.)
0.0( 0.0) 0.0( 0.0)
8971 2073 43.4( 10.0: 20.0) 56.6(1173.)
0.0( 0.0) 0.0( 0.0)
8972 2073 43.4( 10.0: 20.0) 56.6(1173.)
0.0( 0.0) 0.0( 0.0)
I would think with the parameters mentioned above enabled in my
maui.cfg that there should be some kind of value listed in the "Res
(Cons )" column adding to a job's priority. If this were happening,
then suspended jobs, which have already consumed CPU time, should
acquire additional priority points at a higher rate than idle jobs
that were submitted at the same time, thereby meaning they would
(hopefully) be resumed before an idle job in the queue was started. I
have not found this to be the case.
Anyone info on how to solve this would be MUCH appreciated...
Just for kicks... here's my entire maui.cfg:
SERVERHOST node001.cluster
SERVERPORT 42559
SERVERMODE NORMAL
ADMIN1 root
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3
RMCFG[base] TYPE=PBS TIMEOUT=90
RMPOLLINTERVAL 00:00:10
BACKFILLPOLICY FIRSTFIT
NODEALLOCATIONPOLICY MINRESOURCE
NODEACCESSPOLICY SHARED
PREEMPTPOLICY SUSPEND
RESERVATIONPOLICY NEVER
FSPOLICY UTILIZEDPS
DEFERTIME 1:00
DEFERCOUNT 999
DEFERSTARTCOUNT 10
CREDWEIGHT 5
CLASSWEIGHT 8
QOSWEIGHT 2
QUEUETIMEWEIGHT 1
TARGETQUEUETIMEWEIGHT 1
CONSUMEDWEIGHT 3
QOSCFG[lopri] PRIORITY=10 QFLAGS=PREEMPTEE FLAGS=PREEMPTEE
JOBFLAGS=PREEMPTEE
QOSCFG[hipri] PRIORITY=10000 QFLAGS=PREEMPTOR FLAGS=PREEMPTOR
JOBFLAGS=PREEMPTOR
CLASSCFG[long-loprio] QDEF=lopri MAXMEM=1200 MAXJOBPERUSER=30
CLASSCFG[long] QDEF=lopri MAXMEM=1200
CLASSCFG[short] QDEF=lopri
CLASSCFG[interact] QDEF=hipri
CLASSCFG[swbuild] QDEF=hipri
NODECFG[DEFAULT] MAXJOB=5 MAXLOAD=3
USERCFG[DEFAULT] QTTARGET=0:00:01 QLIST=lopri,hipri
More information about the mauiusers
mailing list