[torqueusers] Node np parameter adjusted automatically since
2.1.x ?
Garrick Staples
garrick at clusterresources.com
Wed Jun 21 10:20:23 MDT 2006
On Wed, Jun 21, 2006 at 11:24:55AM -0400, Daniel Widyono alleged:
> Hi,
>
> > We're trying to get to an easier initial configuration.
>
> I understand and agree with your reasoning, just am trying to clean up file
> handling implementation (as I understand it from this conversation).
>
> > For the first time setup, I don't know if we can "assume" a proper setup.
>
> But for a first time setup there wouldn't be a nodes file so there is no need
> to alter anything. I don't have a problem with this logic:
>
> no nodes file exists? set it up for them, use your algorithm for determining
> values
But if there are no nodes, then there is nothing to do.
> nodes file exists? make a nodes.suggested file OR move nodes file to
> nodes.previous and make new nodes file
I'm not sure I understand this... if we make 1 change, we have a backup
copy; make a second change and we lose the backup?
Once pbs_server is running, you don't want to be manually messing with that
file anyways.
> Changing an existing manually-created nodes file in situ is not kosher from a
> sysadmin perspective (mine). At the least, make a backup if a nodes file
> already exists. How to determine manually-created vs. auto? Easy, in your
> "create/modify the nodes file" add a header (does pbs_server allow comments
> in the nodes file? if not, that might be handy for sysadmin/queue mgr).
The nodes file could easily be a mix of manually and program-created
lines. Manually add a node and change an attribute in qmgr, is that
line manual or program-created?
And no, comments aren't supported because pbs_server skips over the
comments.
> Other than that, I'm completely fine with whatever algorithm you end up using
> to calculate np.
>
> > That particular
> > config seems redundant to me anyways. If MOM already reads the number of
> > CPUs, and advertises it to pbs_server, why shouldn't the config be automatic?
>
> I agree, if no manual configuration exists, then pbs_server should know what
> to do given the provided resource information. That's what we did in
> Clubmask; new node comes up? just push it into the database.
I don't think this is reasonable for TORQUE.
> > I'm thinking along the lines of a tri-state value:
> > Unset means "set np=ncpus if (np==1 && ncpus>np)"
> > True is more strict with "set np=ncpus if (np!=ncpus)"
> > False completely disables the feature.
> > The default would be unset.
> >
> > Does that sound reasonable?
>
> Sure. Again, I'm talking about the logic dealing with file handling, not the
> values contained therein, and I'm only concerned with previously existing
> nodes file configuration being altered without the user's consent, or worse,
> without their knowledge.
So we're talking abou 2 different things here. I'm talking about a
node's np and you are talking about the nodes file.
> If you agree with my logic surrounding file handling, I could take a stab at
> coding the patch -- but no guarantees on the quality. I'm a sysadmin, not a
> doctor! I mean, programmer. Not a programmer.
>
> On another tack, if we don't want pbs_server in the role of file management,
> then let's not use a file at all.
>
> Finally, if the file is only intended for caching purposes at pbs_server
> startup (as oppposed to required initial configuration by sysadmin), then
> that should be clearly documented (and put in the CHANGES as a new
> purpose/way of thinking about the nodes file).
It is the stored config on disk. I don't see how that has changed.
More information about the torqueusers
mailing list