[torquedev] [torqueusers] High Availability TORQUE re-compile does this affect pbs_mom?

David Beer dbeer at adaptivecomputing.com
Tue Jul 6 13:45:04 MDT 2010


Moving this discussion to the developers list.

----- Original Message -----
> On Tue, Jul 06, 2010 at 12:03:24PM -0600, David Beer alleged:
> >
> >
> > ----- Original Message -----
> > > On Tue, Jul 06, 2010 at 09:15:08AM -0600, David Beer alleged:
> > > >
> > > >
> > > > ----- Original Message -----
> > > > > Hi,
> > > > >
> > > > >
> > > > > I have a customer with TORQUE already installed, they want to
> > > > > move
> > > > > to
> > > > > TORQUE and High Availability. One of the questions they asked
> > > > > is
> > > > > if
> > > > > they need to re-deploy pbs_mom's after compiling the server --
> > > > > enable-high-availability or does that configure argument only
> > > > > affect
> > > > > the pbs_server component?
> > > > >
> > > > >
> > > >
> > > > They shouldn't need to do this.
> > >
> > > I was just looking at the code and it doesn't look like
> > > --enable-high-availability doesn't do quite what it says it does.
> > >
> > > HA code is always in trunk and is always compiled in. The
> > > --enable-high-availability option is actually just changing how
> > > pbs_server does
> > > the sync locking. Instead of using file locks, it will use pthread
> > > mutuxes.
> > >
> > >
> >
> > This is true, although the built-in high availability doesn't work
> > for the most common case that high availability is meant to cure - a
> > crash on the node where the server is running. The difference
> > between the two is documented:
> > http://www.clusterresources.com/torquedocs21/4.2high-availability.shtml
> >
> 
> Is there any case for using the file locking method? 

It works when the lock is somehow released by the server going down. This may work in some cases (we have customers that use it, I believe) but it is inferior in every way to the other method.

> Is there any reason to keep it? 

Basically only to support the old method. If we can get everyone moved over, I would be fully for getting rid of it. The only other reason to keep it would be that it "works" in that it does what it claims to.

> Should we default to the pthreaded case?

I believe that TORQUE has the capability to compile without posix threads, and making this the default would remove that functionality. I'm not sure how important it is.

> 
> Can we change the name to --(enable|disable)-ha-threads or something
> like that?
> 

The main reason not to do that would be that it makes everyone change the way they configure TORQUE, but maybe we could change it for 3.0.

-- 
David Beer | Senior Software Engineer
Adaptive Computing


More information about the torquedev mailing list