[torqueusers] Stopping passive pbs_server will stop active pbs_server

Clotho Tsang wytsang at clustertech.com
Mon Apr 22 02:30:11 MDT 2013


We are setting up Torque 4.1.4 + Moab 7.2.1 in HA mode, job submission and
dispatching is fine so far.

However, we found that when stopping passive pbs_server with
"/etc/init.d/pbs_server stop",
it will stop the active pbs_server instead. Let me show how to make this:

master1# ps -ef |grep pbs
root      67328      1  1 15:54 ?        00:00:16 /usr/sbin/pbs_server -d
/var/spool/torque --ha -l master1:42559 -l master2:42559

master2# ps -ef |grep pbs
root      24491      1  0 16:05 ?        00:00:00 /usr/sbin/pbs_server -d
/var/spool/torque --ha -l master1:42559 -l master2:42559

Now the active pbs_server is running on master1:
master1# qstat -a | head -2

master1:

Now I stop pbs_server on master2 (switching off master2 machine gets the
same result):

master2# /etc/init.d/pbs_server stop

On master1, pbs_server is shutdown (Shutdown request is from mater2):

master1# tail -f /var/spool/torque/server_logs/20130422
04/22/2013 16:14:57;0086;PBS_Server.73628;Svr;PBS_Server;Shutdown request
from root at master2
04/22/2013 16:14:57;0086;PBS_Server.73628;Svr;PBS_Server;Starting to
shutdown the server, type is Quick
04/22/2013 16:14:57;0002;PBS_Server.67328;Svr;PBS_Server;Server shutdown
completed
04/22/2013 16:14:57;0002;PBS_Server.67328;Svr;Log;Log closed

I found the shutdown behavior is triggered by qterm in
/etc/init.d/pbs_server stop() function.

stop() {
    status pbs_server >/dev/null 2>&1
    if [ $? -ne 0 ]; then
        echo "pbs_server is not running."
        exit 0
    fi
    echo -n "Shutting down TORQUE Server: "
    *$BIN_PATH/qterm*
    RET=$?
    if [[ $RET -ne 0 ]]; then
      killproc pbs_server -TERM
      RET=$?
    fi

    rm -f /var/lock/subsys/pbs_server
    echo
}

I saw there is no "qterm" in Torque earlier version. Why does qterm kill
neighbor's pbs_server, not itself?
Is this pbs_server init script not suitable for HA setup?

Thanks.



-- 
Clotho Tsang
Senior Software Engineer
Cluster Technology Limited
Email: clotho at clustertech.com
Tel: (852) 2655-6129
Fax: (852) 2994-2101
Website: www.clustertech.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130422/0e84f4b6/attachment-0001.html 


More information about the torqueusers mailing list