[torqueusers] Questions about pbs_server --ha

Ken Nielson knielson at clusterresources.com
Mon Apr 13 09:57:41 MDT 2009


Tell us about your NFS setup. Where does the physical disk reside and is it setup to fail over to another system if the primary NFS fails?

Ken Nielson
Cluster Resources
knielson at clusterresources.com

----- Original Message -----
From: "Victor Gregorio" <vgregorio at penguincomputing.com>
To: torqueusers at supercluster.org
Sent: Friday, April 10, 2009 2:54:56 PM GMT -07:00 US/Canada Mountain
Subject: [torqueusers] Questions about pbs_server --ha

Hey folks :)

I've been lurking about for a bit and finally had a question to post.

So, I am using two systems with pbs_server --ha and a shared NFS mount
for /var/spool/torque/server_priv.  In my testing, I bring down the
primary server by pulling the power plug.  Unfortunately, the secondary
server does not pick up and become the primary pbs_server.

Is this because /var/spool/torque/server_priv/server.lock is not removed
when the primary server has a critical failure?

So, I tried removing the server.lock file, but the secondary pbs_server
--ha instance never picks up and becomes primary.  What is the trigger
to activate a passive pbs_server --ha?

Any advice is appreciated.


Victor Gregorio
Penguin Computing

torqueusers mailing list
torqueusers at supercluster.org

More information about the torqueusers mailing list