[torqueusers] epilogue script runs twice

Jeremy Enos jenos at ncsa.uiuc.edu
Tue Feb 2 20:45:32 MST 2010


I too have been extraordinarily aggravated by this inconsistent 
behavior.  How can this not be a bug?  There are any number of reasons 
multiple epilogue calls can cause problems.  If there are any legitimate 
reasons that multiple epilogues /should /be called, then those instances 
should create the workaround- not the other way around.
In my case, I not only make database entries within epilogue but also do 
operations on hardware devices (GPUs) that fail if run over the top of 
one another- this ends up causing a cascading failure when it occurs (as 
it should).  I need a way to prevent multiple epilogue scripts from 
running, or a bug fix.  Can there be consensus that this is a bug, or am 
I missing something?  (perfectly possible)
thanks-

     Jeremy

On 1/11/2010 4:29 PM, Garrick Staples wrote:
> On Mon, Jan 11, 2010 at 04:58:19PM -0500, Kevin Van Workum alleged:
>    
>>> If your script knew that epilogue was run before, what would it change?
>>>
>>>        
>> There are cases when the epilogue script should do some task(s) only once.
>> If it knew that a particular job had already been (or is currently being)
>> processed by a previous instantiation, it could just skip the do-only-once
>> tasks.
>>
>>
>>      
>>> You still wouldn't know if it completed; if it completed successfully. Your
>>> script wouldn't know if *it* had been run before; if it completed. What if
>>> your
>>> script had been terminated halfway through it's critical stage?
>>>
>>> Everything in epilogue just needs to be written to be indempotent.
>>>
>>>        
>> Sure, but it would be much easier in some cases if you knew a priori that
>> epilogue had already been called to process a job. E.g. 'echo "your job
>> completed on $(date)">>  some.log' would be difficult to make idempotent.
>>      
> My point is that you still wouldn't know if the particular actions completed or
> not.
>
> Say, for example, that epilogue ran a script that did some sort of database
> operation. Imagine that the initial database connection hung and epilogue timed
> out. Would you want to skip that database operation on subsequent epilogue
> runs?
>
> It doesn't matter how many times epilogue runs, you still need to check each of
> the tasks you wish to perform and make sure they have been done.
>
>    grep -q "$JOBID completed" some.log || echo "$JOBID completed on $(date)">>  some.log
>
> (personally, I log "epilogue started at $date" at the beginning of the script,
> and "epilogue completed at $date" at the end)
>
>
>    
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>    
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100202/2c8dcda6/attachment.html 


More information about the torqueusers mailing list