[torqueusers] Torque HA in a virtual environment

Prakash Velayutham prakash.velayutham at cchmc.org
Sat Dec 13 07:07:32 MST 2008


Hello All,

I posted this yesterday, but for some reason, it got attached to a  
different thread. So it is again.

Has anyone here tested Torque with "--ha" in a VM (VMware based)  
environment?

I tried the following:

2 VM Torque nodes running OpenSUSE 10.3, Torque-2.3.5

PBS Mom systems (physical hosts, not VMs) running Torque-2.3.5.

In this case, everything seems to run ok, until I submit a bulk of  
jobs, and then I start getting errors like

pbs_iff: cannot read reply from pbs_server
Cannot connect to specified server host 'bmiclustersvc2-int'.
qsub: cannot connect to server bmiclustersvc2-int (errno=111)  
Connection refused

Anyone seen this before? Any ideas what could be going wrong?

Thanks in advance,
Prakash


More information about the torqueusers mailing list