[torqueusers] Node np parameter adjusted automatically since 2.1.x ?

Garrick Staples garrick at clusterresources.com
Wed Jun 21 10:20:23 MDT 2006


On Wed, Jun 21, 2006 at 11:24:55AM -0400, Daniel Widyono alleged:
> Hi,
> 
> > We're trying to get to an easier initial configuration.
> 
> I understand and agree with your reasoning, just am trying to clean up file
> handling implementation (as I understand it from this conversation).
> 
> > For the first time setup, I don't know if we can "assume" a proper setup.
> 
> But for a first time setup there wouldn't be a nodes file so there is no need
> to alter anything.  I don't have a problem with this logic:
> 
> no nodes file exists?  set it up for them, use your algorithm for determining
> values

But if there are no nodes, then there is nothing to do.


> nodes file exists?  make a nodes.suggested file  OR  move nodes file to
> nodes.previous and make new nodes file

I'm not sure I understand this... if we make 1 change, we have a backup
copy; make a second change and we lose the backup?

Once pbs_server is running, you don't want to be manually messing with that
file anyways.

 
> Changing an existing manually-created nodes file in situ is not kosher from a
> sysadmin perspective (mine).  At the least, make a backup if a nodes file
> already exists.  How to determine manually-created vs. auto?  Easy, in your
> "create/modify the nodes file" add a header (does pbs_server allow comments
> in the nodes file? if not, that might be handy for sysadmin/queue mgr).

The nodes file could easily be a mix of manually and program-created
lines.  Manually add a node and change an attribute in qmgr, is that
line manual or program-created?

And no, comments aren't supported because pbs_server skips over the
comments.

 
> Other than that, I'm completely fine with whatever algorithm you end up using
> to calculate np.
> 
> > That particular
> > config seems redundant to me anyways.  If MOM already reads the number of
> > CPUs, and advertises it to pbs_server, why shouldn't the config be automatic?
> 
> I agree, if no manual configuration exists, then pbs_server should know what
> to do given the provided resource information.  That's what we did in
> Clubmask; new node comes up? just push it into the database.

I don't think this is reasonable for TORQUE.


> > I'm thinking along the lines of a tri-state value:
> >   Unset means "set np=ncpus if (np==1 && ncpus>np)"
> >   True is more strict with "set np=ncpus if (np!=ncpus)"
> >   False completely disables the feature.
> >   The default would be unset.
> > 
> > Does that sound reasonable?
> 
> Sure.  Again, I'm talking about the logic dealing with file handling, not the
> values contained therein, and I'm only concerned with previously existing
> nodes file configuration being altered without the user's consent, or worse,
> without their knowledge.

So we're talking abou 2 different things here.  I'm talking about a
node's np and you are talking about the nodes file.


> If you agree with my logic surrounding file handling, I could take a stab at
> coding the patch -- but no guarantees on the quality.  I'm a sysadmin, not a
> doctor!  I mean, programmer.  Not a programmer.
> 
> On another tack, if we don't want pbs_server in the role of file management,
> then let's not use a file at all.
> 
> Finally, if the file is only intended for caching purposes at pbs_server
> startup (as oppposed to required initial configuration by sysadmin), then
> that should be clearly documented (and put in the CHANGES as a new
> purpose/way of thinking about the nodes file).

It is the stored config on disk.  I don't see how that has changed.



More information about the torqueusers mailing list