[torqueusers] Torque in a high-availability setting
Prakash.Velayutham at cchmc.org
Fri Feb 22 09:23:20 MST 2008
I am trying to set up Torque (2.3.0) in a High Availability mode (NOT
with the inbuilt HA feature that you start with --ha flag to
pbs_server, but with heartbeat and shared storage using OCFS2).
Here is the setup:
NIC eth0 - a.a.a.a
NIC eth1 - b.b.b.b
NIC eth0 - c.c.c.c
NIC eth1 - d.d.d.d
I have both the eth1's connected to the cluster's private network.
Both the eth0's are connected to the public nework. I currently do not
have a separate heartbeat link between the servers, but soon will
establish a serial link. Currently I am using eth1 for heartbeat too.
My HA resources that are being failed over are:
IP address - e.e.e.e (which will be in the public network)
IP address - f.f.f.f (which will be in the cluster private network)
I want a DNS entry for e.e.e.e (public IP) to be torqueserver and that
is the IP address I want should be recognized as the server_name.
So essentially, when torqueserver1 goes down (scheduled or
unscheduled), I would like e.e.e.e and f.f.f.f failed over to
torqueserver2 and the DNS entry is still valid (as in any heartbeat
managed IP resource).
How should my different configuration files be for this case
(server_name in server/MOM, mom_priv/config etc.)? And does anyone
already have this setup working?
I stumbled across this site while googling, but the status area warns
that it is not working. http://www.gridpp.ac.uk/wiki/High_Availabilty_Torque
I am also planning on doing the same with Moab, but that seems to be
more difficult compared to this.
Thanks a lot,
More information about the torqueusers