[torqueusers] Torque in HA setting

Prakash Velayutham prakash.velayutham at cchmc.org
Mon Feb 25 07:11:31 MST 2008


Hi,

I sent this email over the weekend to the list. Have not seen anyone  
respond.

I am resending the email just in case.

Also, how are others here using Torque in a HA setting? The new --ha  
flag is only available in the current snapshots and I was wondering if  
that is the only option.

Thanks,
Prakash


Hello All,

I am trying to set up Torque (2.3.0) in a High Availability mode (NOT  
with the inbuilt HA feature that you start with --ha flag to  
pbs_server, but with heartbeat and shared storage using OCFS2).

Here is the setup:

torqueserver1:
	NIC eth0 - a.a.a.a
	NIC eth1 - b.b.b.b

torqueserver2:
	NIC eth0 - c.c.c.c
	NIC eth1 - d.d.d.d

I have both the eth1's connected to the cluster's private network.  
Both the eth0's are connected to the public nework. I currently do not  
have a separate heartbeat link between the servers, but soon will  
establish a serial link. Currently I am using eth1 for heartbeat too.

My HA resources that are being failed over are:

IP address - e.e.e.e (which will be in the public network)
IP address - f.f.f.f (which will be in the cluster private network)

I want a DNS entry for e.e.e.e (public IP) to be torqueserver and that  
is the IP address I want should be recognized as the server_name.

So essentially, when torqueserver1 goes down (scheduled or  
unscheduled), I would like e.e.e.e and f.f.f.f failed over to  
torqueserver2 and the DNS entry is still valid (as in any heartbeat  
managed IP resource).

How should my different configuration files be for this case  
(server_name in server/MOM, mom_priv/config etc.)? And does anyone  
already have this setup working?

I stumbled across this site while googling, but the status area warns  
that it is not working. http://www.gridpp.ac.uk/wiki/High_Availabilty_Torque 
.

I am also planning on doing the same with Moab, but that seems to be  
more difficult compared to this.

Thanks a lot,
Prakash
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080225/545fa87e/attachment.html


More information about the torqueusers mailing list