[Mauiusers] Solved: Log analysis/Slipping Reservations

Ansgar Esztermann aeszter at gwdg.de
Thu May 7 08:10:51 MDT 2009


Hello everyone,

having dug through maui's sources, I think I've solved the problem.

In MResGetNRange(), maui will first create time ranges during which a  
given node is available. These are called ARanges. It will then  
perform various operations on these ranges (e.g. removing those that  
are too short, joining adjacent ones etc). If the node in question is  
either down, drained, or busy but expected to become idle, ARanges are  
adjusted so that they do not start less than NODEDOWNSTATEDELAYTIME in  
the future. The log I am currently investigating indeed shows that the  
node in question is busy/expected idle:

05/05 09:51:41 INFO:     node node023 not considered for backfill  
(State: Busy/EState: Idle)

According to the docs[1], NODEDOWNSTATEDELAYTIME has a default value  
of zero, so ARanges of such nodes should not be modified at all.  
However, a quick grep through the sources seem to indicate a default  
value of 3600 (seconds, I presume). This seems to fit the delay I've  
noticed:

05/05 09:51:40 INFO:     node node023 supports 8 tasks of job 99468:0  
for 23:04:08:22 at 1:00:00

After setting NODEDOWNSTATEDELAYTIME to 30 seconds via the changeparam  
command, the problematic high-priority job quickly started. This seems  
to indicate that the problem is resolved, but since the delays always  
have been somewhat stochastic (maybe some kind of race condition?),  
one cannot be 100% sure.
Besides, the showconfig command does not mention  
NODEDOWNSTATEDELAYTIME, and changeparam tends to silently ignore  
illegal parameters; therefore, I cannot be totally sure I have changed  
any parameter that maui recognizes at all.



A.

[1] http://www.clusterresources.com/products/maui/docs/a.fparameters.shtml#nodedownstatedelaytime



-- 
Ansgar Esztermann
DV-Systemadministration
Max-Planck-Institut für biophysikalische Chemie, Abteilung 105



More information about the mauiusers mailing list