Bugzilla – Bug 60
Linux code to read /proc/self/exe breaks mom auto-update if directory symlinks are involved
Last modified: 2013-02-11 17:41:21 MST
You need to log in before you can comment on or make changes to this bug.
In src/resmom/mom_main.c there is a function MOMFindMyExe() which attempts to locate the path to the executable of pbs_mom. On Linux this returns the contents of /proc/self/exe which breaks on systems which do versioning of Torque installs. For instance, at VPAC and here at VLSCI we install Torque into (for instance): /usr/local/torque/2.4.6 We then have a symlink from /usr/local/torque/latest to the current version in use, and we start /usr/local/torque/latest/sbin/pbs_mom in our scripts. What want is that path to be checked for a newer version, not the dereferenced one which is in /proc/self/exe. For a while I've been using a patch which just #if 0's out the readlink() of /proc/self/exe to make it fall through to the usual code which looks at argv[0] and that works fine. Ideally this would be a runtime option specified in the pbs_mom config file, defaulting to the current behaviour, but switchable when the admin realises it's not working for the above reason. :-)
why keep the old behavior at all? why not just check argv[0] all the time?
(In reply to comment #1) > why keep the old behavior at all? why not just check argv[0] all the time? Because argv[0] might not be in PATH. To avoid another option in torque, just have your script do: V=$(readlink /blah/default) /blah/$V/bin/pbs_mom
I think we'd need to use readlink -f as readlink doesn't return anything (because pbs_mom isn't a symlink, it's a directory higher up in the tree which is). But yes, that's an interesting idea..
(In reply to comment #3) > I think we'd need to use readlink -f as readlink doesn't return anything > (because pbs_mom isn't a symlink, it's a directory higher up in the tree > which is). > > But yes, that's an interesting idea.. Actually, isn't there an option to set the exec path?
(In reply to comment #2) > (In reply to comment #1) > > why keep the old behavior at all? why not just check argv[0] all the time? > > Because argv[0] might not be in PATH. > > To avoid another option in torque, just have your script do: > V=$(readlink /blah/default) > /blah/$V/bin/pbs_mom If you look at OriginalPath in the pbs_server code (main function) you may be able to use that to fix the problem. I had to save Original Path to get the TORQUE high availability feature working properly.
This is still an issue in Torque 4.1.x Our local fix is still to do what is done in the first post.
Created an attachment (id=127) [details] Proposed fix for the bug Proposed fix. I've generalized David's OriginalPath technique, cleaned up and streamlined some of the code, and applied it to both pbs_server and pbs_mom. Please let me know if this resolves the situation. If you need help backporting the fix to whatever version you're currently using, let me know.
FYI, the same fix backported to 4.1.x: https://github.com/kainx/torque/commit/578b0d8ed02777972e128816e2423cde4a35784d
(In reply to comment #8) > FYI, the same fix backported to 4.1.x: > https://github.com/kainx/torque/commit/578b0d8ed02777972e128816e2423cde4a35784d Tried to apply this against released 4.1.4 and it failed: patching file src/server/pbsd_main.c Hunk #4 succeeded at 257 (offset 1 line). Hunk #5 succeeded at 1651 (offset 20 lines). Hunk #6 succeeded at 1674 (offset 20 lines). Hunk #7 FAILED at 2936. Hunk #8 FAILED at 3127. Hunk #9 succeeded at 3190 (offset 15 lines). 2 out of 9 hunks FAILED -- saving rejects to file src/server/pbsd_main.c.rej
Created an attachment (id=128) [details] Patch against 4.1.4 tarball (In reply to comment #9) > Tried to apply this against released 4.1.4 and it failed: > > patching file src/server/pbsd_main.c > Hunk #4 succeeded at 257 (offset 1 line). > Hunk #5 succeeded at 1651 (offset 20 lines). > Hunk #6 succeeded at 1674 (offset 20 lines). > Hunk #7 FAILED at 2936. > Hunk #8 FAILED at 3127. > Hunk #9 succeeded at 3190 (offset 15 lines). > 2 out of 9 hunks FAILED -- saving rejects to file src/server/pbsd_main.c.rej Not surprising; large chunks of code had to be removed, and if patch sees any differences at all, it will barf. Here's a patch against the 4.1.4 tarball.
(In reply to comment #10) > Here's a patch against the 4.1.4 tarball. Chris? Craig? Any happiness? :-)
(In reply to comment #11) > (In reply to comment #10) > > Here's a patch against the 4.1.4 tarball. > > Chris? Craig? Any happiness? :-) Michael, I've been able to get it built. There is a problem in that the Makefile.am file is patched, but the Makefile.in is not regenerated as part of the autogen.sh script. The issue seems related to automake rather than Michael's patch. I have a patch for the Makefile.in that allows it to build, which I will attach. I don't consider my patch necessary for upstream. Now I have it built it seems to be doing what it should, at least in my use cases. It detects the symlinks being altered and restarts pbs_mom as required. Touching the pbs_mom exec works as well. Craig.
Created an attachment (id=129) [details] Patches libutil Makefile.in Generated for testing only, shouldn't be needed in a release as the Makefile.am should generate it properly.
(In reply to comment #11) > (In reply to comment #10) > > Here's a patch against the 4.1.4 tarball. > > Chris? Craig? Any happiness? :-) Sorry, haven't been keeping up. We're still on 2.4.x so not had a chance to look if it's feasible to backport (and I suspect nobody would be interested). cheers! Chris
(In reply to comment #14) > We're still on 2.4.x so not had a chance to look if it's feasible to backport > (and I suspect nobody would be interested). It shouldn't be too difficult. The changes are fairly straight-forward. The patches won't apply cleanly, but it's pretty clear which blocks of code need to be nuked. Unfortunately I'm swamped right now, but if that changes I can attempt a backport.