[Mauiusers] Standing reservations and MOM restarts - Bug?
jay at nersc.gov
Wed Mar 28 16:19:28 MDT 2007
Garrick Staples wrote:
> On Wed, Mar 28, 2007 at 12:16:16AM -0700, Jay Srinivasan alleged:
>> In moab/MRes.c in the MNodeUpdateResExpression() routine (around line
>> 4075 in Maui-3.2.6p19), the check for MaxTasks and TaskCount, which is
>> if ((R->MaxTasks > 0) && (R->TaskCount >= R->MaxTasks)) continue;
>> I think, will check to see if the task count for the SR is more than the
>> SRMAXTASKS parameter and then continue to the next SR and not update the
>> current SR with the node(s) in the RegExp under consideration.
>> But, in Maui atleast, it does not seem that the SRMAXTAKS parameter is
>> even honored (nor do setres or MResCreate() even take it as a
>> parameter), and so it seems that MaxTasks is always zero in this case
>> for SRs.
>> Thus, everytime a pbs_mom is recycled, this routine ends up adding the
>> node that just came up to the SR nodelist, whether the node was on the
>> list originally or not. This results in the SR gradually growing in size.
>> I think the fix for this is to simply check for a possible MaxTasks
>> value of 0 as well, i.e.
>> if ((R->MaxTasks >= 0) && (R->TaskCount >= R->MaxTasks)) continue;
>> Could someone who has a better knowledge of Maui internals please
>> confirm that this is the case or let me know if I am not correct?
> I can't comment directly on the problem, but I can say that Maui doesn't
> talk to pbs_mom and I can't think of any reason why restarting pbs_mom
> could effect Maui.
Yes, perhaps not directly. But Maui has to know how many MOMs are
running and coordinate the node->SR mapping. So, when Maui does its
periodic scan and figures out that a node which was down has become
available again (either through Torque or PBSPro -- I have the problem
under both), it goes through the MNodeUpdateResExpression() code path
and tosses that node onto the SR nodelist always (whether or not the
node was on the SR nodelist to begin with).
> mauiusers mailing list
> mauiusers at supercluster.org
More information about the mauiusers