[torqueusers] Questions about pbs_server --ha

Victor Gregorio vgregorio at penguincomputing.com
Fri Apr 10 14:54:56 MDT 2009


Hey folks :)

I've been lurking about for a bit and finally had a question to post.

So, I am using two systems with pbs_server --ha and a shared NFS mount
for /var/spool/torque/server_priv.  In my testing, I bring down the
primary server by pulling the power plug.  Unfortunately, the secondary
server does not pick up and become the primary pbs_server.

Is this because /var/spool/torque/server_priv/server.lock is not removed
when the primary server has a critical failure?

So, I tried removing the server.lock file, but the secondary pbs_server
--ha instance never picks up and becomes primary.  What is the trigger
to activate a passive pbs_server --ha?

Any advice is appreciated.

Regards,

-- 
Victor Gregorio
Penguin Computing



More information about the torqueusers mailing list