[torqueusers] Questions about pbs_server --ha

Josh Butikofer josh at clusterresources.com
Mon Apr 13 12:05:51 MDT 2009


Victor,

The latest snapshot of TORQUE 2.3.x (not yet an officially released version) 
allows you to configure where the lock file is stored. You could then tell it to 
store the file in a non-NFS mounted location so that when the passive becomes 
active it is not blocked by the server.lock file being present on the NFS share.

The downside to this is you will be using a snapshot. We hope to release the 
next version of TORQUE in a few weeks. We are looking for users willing to kick 
the new TORQUE's tires, however, so if you're interested let us know and we'll 
cut you a new build.

Another option, although more of a workaround until the new TORQUE is released, 
is to have the CentOS heartbeat feature run a script to delete the server.lock 
when the passive server becomes active.

Josh Butikofer
Cluster Resources, Inc.
#############################


Victor Gregorio wrote:
> Thanks Ken,
> 
> Taking your advice, I configured the two pbs_servers to run an
> active/passive HA configuration using CentOS's Heartbeat services.  I am
> no longer running pbs_server with --ha, since only one pbs_server
> instance will be running at a time.
> 
> Both primary and secondary pbs_servers still use a shared NFS partition
> (on a third machine) for /var/spool/torque/server_priv.
> 
> Unfortunately, there is still a server.lock file left by the primary
> pbs_server when is starts up.  So, when the primary system critically
> fails, the secondary system cannot start pbs_server.
> 
> Thoughts?
> 


More information about the torqueusers mailing list