[torqueusers] Server not talking to MOMs at all
Troy Baer
troy at osc.edu
Wed Sep 7 08:01:18 MDT 2005
On Sat, 2005-09-03 at 15:31 -0700, Garrick Staples wrote:
> On Sat, Sep 03, 2005 at 04:22:00PM -0600, Dave Jackson alleged:
> > Garrick,
> > > > The first $clienthost listed identifies the "server" to the MOM. It
> > is the
> > > > > only hostname that will receive status updates from the MOM.
> > > >
> > > > I would argue that this behavior is somewhere between counter-intuitive
> > > > and broken, even if it has been in PBS since the beginning of time. :)
> >
> > We would like to add a 'synonym parameter to $clienthost called
> > '$headnode'. It would behave exactly like $clienthost except for the
> > 'confusing both old and new alike' part.
> >
> > Is this the best name? Thoughts?
>
> I just noticed that '$pbsserver' is already a synonym for '$clienthost'.
> I don't know how long it's been there, but it looks "post OpenPBS" to
> me. I suppose that is as suitable as '$headnode'; and someone,
> somewhere, already has '$pbsserver' in their config files.
The problem IMHO is that the pbs_mom code has a single array called
pbs_servername[] that appears to be an ACL of hosts allowed to talk to
the pbs_mom daemon, including both the server(s) *AND* all the other
moms without differentiating between the two (except that the very first
one is special). I would argue that the correct solution to this is to
add a second ACL called pbs_clientname[] that's the list of moms the
local mom daemon is allowed to talk to. That requires $clienthost and
$pbsserver to do different things when the mom config file is parsed,
but it makes multi-server easier insofar as it's now kosher for pbs_mom
to send utilization data to every host listed in pbs_servername[]. (I
would also argue that the contents of $PBS_HOME/server_name needs to be
pbs_servername[0].)
BTW, the $headnode terminology IMHO doesn't make sense in an environment
where the system running pbs_server (and likely pbs_sched/maui/moab as
well) is *NOT* a login node, although I'm not sure if that describes
anyone's system structure other than ours. In any case, I would argue
that $pbsserver (or alternately $serverhost) is much more descriptive.
> Multi-server support still confuses me. I'm still really unsure what
> precise behaviour people want.
Same here. I don't see how you can make multi-server support work
without a shared filesystem (with working locks, i.e. not NFS) for
$PBS_HOME and the addition of a whole bunch of cluster membership and
voting code to pbs_server. It seems like overkill if what we're really
looking for is simply a high-availability pbs_server.
What exactly are people looking for WRT multi-server support in TORQUE,
anyway?
--Troy
--
Troy Baer troy at osc.edu
Science & Technology Support http://www.osc.edu/hpc/
Ohio Supercomputer Center 614-292-9701
More information about the torqueusers
mailing list