[torqueusers] Questions about pbs_server --ha
Victor Gregorio
vgregorio at penguincomputing.com
Mon Apr 13 10:14:14 MDT 2009
Hello Ken,
Thanks for the reply. I have a third system which exports NFS storage
for both pbs_servers' /var/spool/torque/server_priv. For now, there is
no NFS redundancy.
* export options: *(rw,sync,no_root_squash)
* mount options on both pbs_servers: bg,intr,soft,rw
--
Victor Gregorio
Penguin Computing
On Mon, Apr 13, 2009 at 09:57:41AM -0600, Ken Nielson wrote:
> Victor,
>
> Tell us about your NFS setup. Where does the physical disk reside and is it setup to fail over to another system if the primary NFS fails?
>
> Ken Nielson
> --------------------
> Cluster Resources
> knielson at clusterresources.com
>
>
> ----- Original Message -----
> From: "Victor Gregorio" <vgregorio at penguincomputing.com>
> To: torqueusers at supercluster.org
> Sent: Friday, April 10, 2009 2:54:56 PM GMT -07:00 US/Canada Mountain
> Subject: [torqueusers] Questions about pbs_server --ha
>
> Hey folks :)
>
> I've been lurking about for a bit and finally had a question to post.
>
> So, I am using two systems with pbs_server --ha and a shared NFS mount
> for /var/spool/torque/server_priv. In my testing, I bring down the
> primary server by pulling the power plug. Unfortunately, the secondary
> server does not pick up and become the primary pbs_server.
>
> Is this because /var/spool/torque/server_priv/server.lock is not removed
> when the primary server has a critical failure?
>
> So, I tried removing the server.lock file, but the secondary pbs_server
> --ha instance never picks up and becomes primary. What is the trigger
> to activate a passive pbs_server --ha?
>
> Any advice is appreciated.
>
> Regards,
>
> --
> Victor Gregorio
> Penguin Computing
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list