[torqueusers] Server not talking to MOMs at all

Dave Jackson jacksond at clusterresources.com
Sat Sep 3 16:22:00 MDT 2005


Garrick,

> > The first $clienthost listed identifies the "server" to the MOM.  It
is the
> > > only hostname that will receive status updates from the MOM.
> > 
> > I would argue that this behavior is somewhere between counter-intuitive
> > and broken, even if it has been in PBS since the beginning of time. :)
 
  We would like to add a 'synonym parameter to $clienthost called
'$headnode'.  It would behave exactly like $clienthost except for the
'confusing both old and new alike' part.

  Is this the best name?  Thoughts?

Dave
 

On Thu, 2005-09-01 at 14:16 -0700, Garrick Staples wrote:
> On Thu, Sep 01, 2005 at 04:50:23PM -0400, Troy Baer alleged:
> > On Mon, 2005-08-15 at 16:14 -0700, Garrick Staples wrote:
> > > The first $clienthost listed identifies the "server" to the MOM.  It is the
> > > only hostname that will receive status updates from the MOM.
> > 
> > I would argue that this behavior is somewhere between counter-intuitive
> > and broken, even if it has been in PBS since the beginning of time. :)
>  
> Agreed.  The word "client" in that parameter confused me in the
> beginning; I tend to think of pbs_server as the "server" and MOMs as
> "clients".
> 
> 
> > It seems to me that the most expeditious solution to this would be to
> > make pbs_mom behave in a manner symmetric with pbs_server and the client
> > programs, i.e. use $PBS_DEFAULT as the server host[:port] if it's set,
> > or the contents of $PBS_HOME/server_name if it's not.  Then you can use
> > your favorite failover or virtualization scheme to move that IP address
> > between hosts for high availability purposes.
>  
> But there is nothing preventing you from moving the server's IP right
> now.  
> 
> And some people want multiple servers at the same time; which your
> solution would prevent.  Perhaps that implies a $serverhost config (the
> basl scheduler has this).
> 
> 
> And there are other issues with multiple servers config'd in MOM:
> 
> If you intend to have 1 primary "hot" server, and 1 backup "cold"
> server, then MOM will waste a whole lot of time talking to a server that
> isn't running.
> 
> With a backup "hot" server, how does it get the primary's state?
> 
> I'd like to see pbs_server push more configs to MOMs, how is that
> handled with multiple servers?
> 
> If the idea is to have nodes in multiple clusters at the same time, how
> do you enforce policies like "jobs per node"?
> 
> 
> > I'm going to be out for the next few days, but I may try to crank out a
> > patch for this when I get back next week.
> 
> Personally, I've been avoiding this issue, because every time I think
> about multi-server support I get completely lost on the specifics.
> 
> I was thinking of talking to people at the SC05 BOF before any more code
> changes.  Perhaps the only sensible decisions are in the context of
> moab.
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list