[torqueusers] epilogue script runs twice
Jeremy Enos
jenos at ncsa.uiuc.edu
Tue Feb 2 20:45:32 MST 2010
I too have been extraordinarily aggravated by this inconsistent
behavior. How can this not be a bug? There are any number of reasons
multiple epilogue calls can cause problems. If there are any legitimate
reasons that multiple epilogues /should /be called, then those instances
should create the workaround- not the other way around.
In my case, I not only make database entries within epilogue but also do
operations on hardware devices (GPUs) that fail if run over the top of
one another- this ends up causing a cascading failure when it occurs (as
it should). I need a way to prevent multiple epilogue scripts from
running, or a bug fix. Can there be consensus that this is a bug, or am
I missing something? (perfectly possible)
thanks-
Jeremy
On 1/11/2010 4:29 PM, Garrick Staples wrote:
> On Mon, Jan 11, 2010 at 04:58:19PM -0500, Kevin Van Workum alleged:
>
>>> If your script knew that epilogue was run before, what would it change?
>>>
>>>
>> There are cases when the epilogue script should do some task(s) only once.
>> If it knew that a particular job had already been (or is currently being)
>> processed by a previous instantiation, it could just skip the do-only-once
>> tasks.
>>
>>
>>
>>> You still wouldn't know if it completed; if it completed successfully. Your
>>> script wouldn't know if *it* had been run before; if it completed. What if
>>> your
>>> script had been terminated halfway through it's critical stage?
>>>
>>> Everything in epilogue just needs to be written to be indempotent.
>>>
>>>
>> Sure, but it would be much easier in some cases if you knew a priori that
>> epilogue had already been called to process a job. E.g. 'echo "your job
>> completed on $(date)">> some.log' would be difficult to make idempotent.
>>
> My point is that you still wouldn't know if the particular actions completed or
> not.
>
> Say, for example, that epilogue ran a script that did some sort of database
> operation. Imagine that the initial database connection hung and epilogue timed
> out. Would you want to skip that database operation on subsequent epilogue
> runs?
>
> It doesn't matter how many times epilogue runs, you still need to check each of
> the tasks you wish to perform and make sure they have been done.
>
> grep -q "$JOBID completed" some.log || echo "$JOBID completed on $(date)">> some.log
>
> (personally, I log "epilogue started at $date" at the beginning of the script,
> and "epilogue completed at $date" at the end)
>
>
>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100202/2c8dcda6/attachment.html
More information about the torqueusers
mailing list