[Mauiusers] problems with qos'es and standing reservations

Marcin mar_mog at o2.pl
Tue Dec 20 19:20:06 MST 2005


Hi,

I just managed to start latest maui snapshot working together with 
torque. Trying to configure standing reservations (to be able to borrow 
nodes from one class to another, which is not possible using node 
features) I discovered two things, both of them replicated.

1) Not clearly bug, but rather omission, which ruins maui behaviour - 
qos is not updated after issuing qmove command. Example:

SYSCFG[base]     QLIST=
CLASSCFG[test] QLIST=test QDEF=test
QOSCFG[test] QFLAGS=IGNALL

#qsub ./job
41277.cluster
#checkjob 41277
Creds:  user:mar  group:staff  class:default  qos:DEFAULT
#qmove test 41277
#checkjob 41277
Creds:  user:mar  group:staff  class:test  qos:DEFAULT

In short - after issuing qmove command qos is not touched, while I would 
expect it to be set to QDEF value. Analogical situation lasts for one of 
my routing queues - qmove is just simpler to demonstrate.

2) This one is bug or my inability to read documentation:) Standing 
reservation I configured relies on qos while it should not.

SYSCFG[base]     QLIST=
SRCFG[test] PERIOD=INFINITY HOSTLIST=n96,n97 ACCESS=DEDICATED CLASSLIST=test
CLASSCFG[test] QLIST=test QDEF=test JOBFLAGS=ADVRES
QOSCFG[test] QFLAGS=IGNALL

My interpretation of this config: create standing reservation on two 
hosts available for torque queue test only.

The way it works:
#qsub ./job
41278.cluster
#checkjob 41278
Creds:  user:mar  group:staff  class:default  qos:DEFAULT
#qmove test 41278
Creds:  user:mar  group:staff  class:test  qos:DEFAULT
Flags:       ADVRES RESTARTABLE
PE:  2.00  StartPriority:  21725
job cannot run in partition DEFAULT (idle procs do not meet requirements 
: 0 of 2 procs found)
idle procs:  22  feasible procs:   0
Rejection Reasons: [CPU          :1][State        :104][ReserveTime  :4]
n96                      rejected : ReserveTime
n97                      rejected : ReserveTime

In my opinion state the job is in should be enough to get into 
reservation test - the only restriction I set is to be into test queue. 
As you can see it is not - flag ADVRES is set properly, but all the 
nodes are considered for running that job, and, moreover, two nodes 
reserved for such jobs are unavailable, because they are just 
reserved:). But - suprise suprise! - there is the way of getting into 
that reservation:

#setqos test 41278
#checkjob 41278
Creds:  user:mar  group:staff  class:test  qos:test
Flags:       ADVRES RESTARTABLE
PE:  2.00  StartPriority:  21730
job can run in partition DEFAULT (4 procs available.  2 procs required)

And this job will start during first maui scheduling cycle. I have 
completely no idea, why QOS has any impact on it. It's possible that 
maui is blindfooled because of job data inconsistency (qos not allowed 
for given class), but it doesn't justify it - that inconsistency was 
created not by human factor, but by maui itself.

Are these two really bugs or my imagination only and, if they are, is 
there any chance to have them fixed?

	Marcin Mogielnicki, ICM, Poland


More information about the mauiusers mailing list