[torqueusers] Torque in HA setting
prakash.velayutham at cchmc.org
Mon Feb 25 07:11:31 MST 2008
I sent this email over the weekend to the list. Have not seen anyone
I am resending the email just in case.
Also, how are others here using Torque in a HA setting? The new --ha
flag is only available in the current snapshots and I was wondering if
that is the only option.
I am trying to set up Torque (2.3.0) in a High Availability mode (NOT
with the inbuilt HA feature that you start with --ha flag to
pbs_server, but with heartbeat and shared storage using OCFS2).
Here is the setup:
NIC eth0 - a.a.a.a
NIC eth1 - b.b.b.b
NIC eth0 - c.c.c.c
NIC eth1 - d.d.d.d
I have both the eth1's connected to the cluster's private network.
Both the eth0's are connected to the public nework. I currently do not
have a separate heartbeat link between the servers, but soon will
establish a serial link. Currently I am using eth1 for heartbeat too.
My HA resources that are being failed over are:
IP address - e.e.e.e (which will be in the public network)
IP address - f.f.f.f (which will be in the cluster private network)
I want a DNS entry for e.e.e.e (public IP) to be torqueserver and that
is the IP address I want should be recognized as the server_name.
So essentially, when torqueserver1 goes down (scheduled or
unscheduled), I would like e.e.e.e and f.f.f.f failed over to
torqueserver2 and the DNS entry is still valid (as in any heartbeat
managed IP resource).
How should my different configuration files be for this case
(server_name in server/MOM, mom_priv/config etc.)? And does anyone
already have this setup working?
I stumbled across this site while googling, but the status area warns
that it is not working. http://www.gridpp.ac.uk/wiki/High_Availabilty_Torque
I am also planning on doing the same with Moab, but that seems to be
more difficult compared to this.
Thanks a lot,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers