[Mauiusers] problems with qos'es and standing reservations

Marcin mar_mog at o2.pl
Thu Jan 5 12:39:05 MST 2006


Hello,

I know that maui is open-source, free nad so on, but what's the point in 
asking for bug reports, if there is even no one word long feedback after 
receiving one? All I want is confirmation that these two issues really 
exist with short status of it - notabug, wontbefixed, will be in the 
next release, in the next eight years?

I have some confirmations of this being bug. QOS is set only once - 
during job initialization, in MPBSJobSetAttr. During every iteration 
MPBSJobUpdate is called, which updates class, but nothing more, what is 
class-configurable. It means that if class is changed and there are any 
class-specific configuration parameters, they are not updated. Update 
takes place in two situations only - job submission and maui starting. 
In both stuations MPBSJobSetAttr is called. When I have a job with 
parameters A, then I restart maui and have it with B, it looks like a bug.

Coupling these with standing reservations depending on having qos set 
(no idea why, they shouldn't), it makes maui half-unusable for anyone 
using there standing reservations and moving jobs between classes 
(queues in PBS) from time to time. In short - job can't get into 
reservation without having qos set. After discovering job by maui for 
the first time qos is not updated with the exception of setqos command 
or restart of maui. Fixing it looks really simple for someone knowing in 
which part of code class-specific parameters should be rewritten.

	Marcin Mogielnicki, ICM, Poland

Marcin wrote:
> Hi,
> 
> I just managed to start latest maui snapshot working together with 
> torque. Trying to configure standing reservations (to be able to borrow 
> nodes from one class to another, which is not possible using node 
> features) I discovered two things, both of them replicated.
> 
> 1) Not clearly bug, but rather omission, which ruins maui behaviour - 
> qos is not updated after issuing qmove command. Example:
> 
> SYSCFG[base]     QLIST=
> CLASSCFG[test] QLIST=test QDEF=test
> QOSCFG[test] QFLAGS=IGNALL
> 
> #qsub ./job
> 41277.cluster
> #checkjob 41277
> Creds:  user:mar  group:staff  class:default  qos:DEFAULT
> #qmove test 41277
> #checkjob 41277
> Creds:  user:mar  group:staff  class:test  qos:DEFAULT
> 
> In short - after issuing qmove command qos is not touched, while I would 
> expect it to be set to QDEF value. Analogical situation lasts for one of 
> my routing queues - qmove is just simpler to demonstrate.
> 
> 2) This one is bug or my inability to read documentation:) Standing 
> reservation I configured relies on qos while it should not.
> 
> SYSCFG[base]     QLIST=
> SRCFG[test] PERIOD=INFINITY HOSTLIST=n96,n97 ACCESS=DEDICATED 
> CLASSLIST=test
> CLASSCFG[test] QLIST=test QDEF=test JOBFLAGS=ADVRES
> QOSCFG[test] QFLAGS=IGNALL
> 
> My interpretation of this config: create standing reservation on two 
> hosts available for torque queue test only.
> 
> The way it works:
> #qsub ./job
> 41278.cluster
> #checkjob 41278
> Creds:  user:mar  group:staff  class:default  qos:DEFAULT
> #qmove test 41278
> Creds:  user:mar  group:staff  class:test  qos:DEFAULT
> Flags:       ADVRES RESTARTABLE
> PE:  2.00  StartPriority:  21725
> job cannot run in partition DEFAULT (idle procs do not meet requirements 
> : 0 of 2 procs found)
> idle procs:  22  feasible procs:   0
> Rejection Reasons: [CPU          :1][State        :104][ReserveTime  :4]
> n96                      rejected : ReserveTime
> n97                      rejected : ReserveTime
> 
> In my opinion state the job is in should be enough to get into 
> reservation test - the only restriction I set is to be into test queue. 
> As you can see it is not - flag ADVRES is set properly, but all the 
> nodes are considered for running that job, and, moreover, two nodes 
> reserved for such jobs are unavailable, because they are just 
> reserved:). But - suprise suprise! - there is the way of getting into 
> that reservation:
> 
> #setqos test 41278
> #checkjob 41278
> Creds:  user:mar  group:staff  class:test  qos:test
> Flags:       ADVRES RESTARTABLE
> PE:  2.00  StartPriority:  21730
> job can run in partition DEFAULT (4 procs available.  2 procs required)
> 
> And this job will start during first maui scheduling cycle. I have 
> completely no idea, why QOS has any impact on it. It's possible that 
> maui is blindfooled because of job data inconsistency (qos not allowed 
> for given class), but it doesn't justify it - that inconsistency was 
> created not by human factor, but by maui itself.
> 
> Are these two really bugs or my imagination only and, if they are, is 
> there any chance to have them fixed?
> 
>     Marcin Mogielnicki, ICM, Poland
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
> 



More information about the mauiusers mailing list