Bugzilla – Bug 35
Pbs_server writes the wrong PID number to $pbs_home/server_priv/server.lock
Last modified: 2009-12-04 16:40:22 MST
You need to log in before you can comment on or make changes to this bug.
When pbs_server starts, it writes the wrong PID number to the $pbs_home/server_priv/server.lock file. The PID number written to this file is the pbs_server PID number minus 1. This prevents the /etc/init.d/pbs script to properly stop the server. Only the scheduler is stopped. [root@fn1 ~]# ps -ef | grep pbs_server root 18669 1 0 12:02 ? 00:00:00 /usr/torque/sbin/pbs_server root 19016 744 0 16:47 pts/1 00:00:00 grep pbs_server [root@fn1 ~]# cat /var/spool/torque/server_priv/server.lock 18668 [root@fn1 ~]#
It appears that pbs_server writes out this lock file before it forks itself to put itself into the background, and the bug appears at least as far back as later 2.3.x versions. Some of this code was modified for "high availability" mode where multiple pbs_servers could be monitoring the same lock file. I am going to propose a solution to the TORQUE developers mailing list for comments, and we should get this fixed in 2.3 and 2.4 branches (as well as subversion trunk)
actually, I take my comment back. The bug is not in the 2.3.x branch, it appeared in 2.4.x
as far as I can tell, at least when not running in HA mode, the code looks like it should do the right thing: fork, create a new session, and write the session ID (which should be the same as the pid for the newly forked process) to the lock file. I'll probably add some debugging output to my local build to see if I can track this down.
I looked into this and I have fixed it. For normal mode, the problem is that the pid for the server wasn't updated after the last fork, thus it had the one off problem. For high availability mode (with --enable-high-availability configured) the problem was the it didn't write anything to the lock file at all. Both of these problems have been corrected in a patch I created that is being reviewed for check-in. Cheers, David Beer
David, could you post the patch here after it has been reviewed for check-in.
Sure, once we clear the patch I will post it here.
Created an attachment (id=21) [details] Fix This is the patch to fix this bug.