[torqueusers] Has anyone tested HA features in a VM environment

Prakash Velayutham prakash.velayutham at cchmc.org
Fri Dec 12 14:37:53 MST 2008


Hello All,

Has anyone here tested Torque with "--ha" in a VM (VMware based)  
environment?

I tried the following:

2 VM Torque nodes running OpenSUSE 10.3, Torque-2.3.5

PBS Mom systems (physical hosts, not VMs) running Torque-2.3.5.

In this case, everything seems to run ok, until I submit a bulk of  
jobs, and then I start getting errors like

pbs_iff: cannot read reply from pbs_server
Cannot connect to specified server host 'bmiclustersvc2-int'.
qsub: cannot connect to server bmiclustersvc2-int (errno=111)  
Connection refused

Anyone seen this before? Any ideas what could be going wrong?

Thanks,
Prakash

On Dec 12, 2008, at 4:12 PM, Josh Butikofer wrote:

> This website may be helpful for you:
>
> http://www.clusterresources.com/torquedocs21/4.3high- 
> availability.shtml
>
> It explains on how to setup high-availability and will probably do  
> what you want.
>
> Josh Butikofer
> Cluster Resources, Inc.
> #############################
>
>
> Yang Wang wrote:
>> Dear friends,
>> Is that possible to run two pbs_server daemons for the same cluster  
>> for fall-over purpose? Has someone done this? Is there a brief doc  
>> showing how to set up such a system?
>> Thanks and happy holidays!
>> Yang
>> ------------------------------------------------------------------------
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list