[Mauiusers] Solved: Log analysis/Slipping Reservations
Ansgar Esztermann
aeszter at gwdg.de
Thu May 7 08:10:51 MDT 2009
Hello everyone,
having dug through maui's sources, I think I've solved the problem.
In MResGetNRange(), maui will first create time ranges during which a
given node is available. These are called ARanges. It will then
perform various operations on these ranges (e.g. removing those that
are too short, joining adjacent ones etc). If the node in question is
either down, drained, or busy but expected to become idle, ARanges are
adjusted so that they do not start less than NODEDOWNSTATEDELAYTIME in
the future. The log I am currently investigating indeed shows that the
node in question is busy/expected idle:
05/05 09:51:41 INFO: node node023 not considered for backfill
(State: Busy/EState: Idle)
According to the docs[1], NODEDOWNSTATEDELAYTIME has a default value
of zero, so ARanges of such nodes should not be modified at all.
However, a quick grep through the sources seem to indicate a default
value of 3600 (seconds, I presume). This seems to fit the delay I've
noticed:
05/05 09:51:40 INFO: node node023 supports 8 tasks of job 99468:0
for 23:04:08:22 at 1:00:00
After setting NODEDOWNSTATEDELAYTIME to 30 seconds via the changeparam
command, the problematic high-priority job quickly started. This seems
to indicate that the problem is resolved, but since the delays always
have been somewhat stochastic (maybe some kind of race condition?),
one cannot be 100% sure.
Besides, the showconfig command does not mention
NODEDOWNSTATEDELAYTIME, and changeparam tends to silently ignore
illegal parameters; therefore, I cannot be totally sure I have changed
any parameter that maui recognizes at all.
A.
[1] http://www.clusterresources.com/products/maui/docs/a.fparameters.shtml#nodedownstatedelaytime
--
Ansgar Esztermann
DV-Systemadministration
Max-Planck-Institut für biophysikalische Chemie, Abteilung 105
More information about the mauiusers
mailing list