[torqueusers] epilogue script runs twice
jenos at ncsa.uiuc.edu
Wed Feb 3 02:59:48 MST 2010
Thanks Axel- that's somewhat what I've done already- I just have the
second epilogue exit if another is already running for that job. My
complaint wasn't that I couldn't create a workaround however- it was
that I shouldn't have to. Unless of course this behavior is by design
and not an oversight, and if that's the case- I'd be curious to know why.
On 2/2/2010 10:49 PM, Axel Kohlmeyer wrote:
> On Tue, Feb 2, 2010 at 10:45 PM, Jeremy Enos<jenos at ncsa.uiuc.edu> wrote:
>> I too have been extraordinarily aggravated by this inconsistent behavior.
>> How can this not be a bug? There are any number of reasons multiple
>> epilogue calls can cause problems. If there are any legitimate reasons that
>> multiple epilogues should be called, then those instances should create the
>> workaround- not the other way around.
>> In my case, I not only make database entries within epilogue but also do
>> operations on hardware devices (GPUs) that fail if run over the top of one
>> another- this ends up causing a cascading failure when it occurs (as it
>> should). I need a way to prevent multiple epilogue scripts from running, or
>> a bug fix. Can there be consensus that this is a bug, or am I missing
>> something? (perfectly possible)
> how about using a .pid lockfile? before you do a "sensitive" operation,
> create a .pid lock file in which you echo the value of $$, then do the
> critical stuff, and delete it. then you wrap this code into a function that
> tests whether the same pid still exists if the file already exists. if not
> ignore/delete the lockfile, if yes. wait with sleep 1 until the file is gone
> or some large number has passed.
> as was said before, the epilogue script should assume nothing
> about what has been done before and whether it has completed,
> but rather make sure that the system gets back to a defined state.
More information about the torqueusers