[gold-users] gcharge issued twice via Torque's epilogue script

Kevin Van Workum vanw at sabalcore.com
Wed Dec 23 08:16:47 MST 2009


On Tue, Dec 22, 2009 at 6:02 PM, Scott Jackson <
scottmo at adaptivecomputing.com> wrote:

> Kevin,
>
> Sounds good. Unfortunately, that may be about the best you can do until
> Torque gets fixed.
>
> Hmmm.... Actually...
>
> Now that I think about it, there would be one other thing to do. If you did
> something in your epilog that flipped a per-jobid semaphore or something,
> you would only call the charge if the thing had not already been flipped.
> Maybe the presence of a file, or an entry in a database (protected by
> locking the row).
>
> So for example, Begin Work; Insert into Jobs set Jobid=$JobId; Commit. If
> that succeeds, you do the charge. If it fails due to an existing entry, log
> the dup invocation to a log.
>
> I think that is a tad better than refunding after the fact.


Yes, that is a better solution and it seems to work. What I'm doing now it
creating per-job symlink at the beginning of my epilogue script. If the
symlink call fails, then I know that the job had already been charged or is
in the process of being charged. I'm using a symlink because it is an atomic
operation (I think). Or should I use a system call to create the link?

Thanks,

Kevin


>
> Scott
>
>
> Kevin Van Workum wrote:
>
>>
>>
>> On Tue, Dec 22, 2009 at 2:53 PM, Kevin Van Workum <vanw at sabalcore.com<mailto:
>> vanw at sabalcore.com>> wrote:
>>
>>    On Tue, Dec 22, 2009 at 2:51 PM, Kevin Van Workum
>>    <vanw at sabalcore.com <mailto:vanw at sabalcore.com>> wrote:
>>    > On Tue, Dec 22, 2009 at 12:46 PM, Wojciech Turek
>>    <wjt27 at cam.ac.uk <mailto:wjt27 at cam.ac.uk>> wrote:
>>    >>
>>    >> What about glstxn -J <job id> ? You could use this command in
>>    your epilogue script to check if charge transaction was made for
>>    the particular jobid.
>>    >
>>    > Yes, I tried that, but surprisingly that doesn't always work. It
>>    > appears that gcharge is implemented asynchronously, at least wrt
>>    > glstxn. Here's a simplified snippet of my epilogue script (perl) and
>>    > glstxn output for a job that still got charged twice.
>>    >
>>    > #!/usr/bin/perl
>>    >
>>    > open LG, "glstxn -J $jobid|";
>>
>>    Before you mention it, I'm actually using "glstxn --quite -J
>>    $jobid|" here.
>>
>>    > @buf = <LG>;
>>    > close LG;
>>    >
>>    > if(@buf == 0) {
>>    >    system("gcharge $args");
>>    > } else {
>>    >    print STDERR "$jobid has already been charged ", @buf+0, "
>>    times\n";
>>    > }
>>    >
>>    >
>>    > # glstxn -J 230465.jman --show JobId,Id,CreationTime
>>    > JobId       Id     CreationTime
>>    > ----------- ------ -------------------
>>    > 230465.jman 847102 2009-12-22 14:35:32
>>    > 230465.jman 847107 2009-12-22 14:35:32
>>    >
>>    >
>>
>>
>> FYI, I decided to just run a cronjob every night that searches for
>> duplicated charges and refunds the extra charges.
>>
>> Kevin
>>
>>    >>
>>    >> Cheers
>>    >>
>>    >> Wojciech
>>    >>
>>    >> 2009/12/22 Scott Jackson <scottmo at adaptivecomputing.com
>>    <mailto:scottmo at adaptivecomputing.com>>
>>
>>    >>>
>>    >>> Kevin,
>>    >>>
>>    >>> No, I'm sorry. There is not. Gold will charge for a job as
>>    many times as
>>    >>> it is called. There are provisions for incremental charging
>>    where it all
>>    >>> goes against the same job instance, and if not, it considers them
>>    >>> separate jobs with the same jobid. All I can think of is that
>>    you could
>>    >>> write a wrapper script that looks up the jobid and if it has
>>    already
>>    >>> been charged that same day, ignores the second charge.
>>    >>>
>>    >>> I assume you have a ticket open with the Torque support queue
>>    on this.
>>    >>>
>>    >>> Scott
>>    >>>
>>    >>>
>>    >>> Kevin Van Workum wrote:
>>    >>> > I use Torque's epilogue script to issue the gcharge command
>>    after a
>>    >>> > job completes. However, it occasionally happens that the
>>    epilogue
>>    >>> > script runs twice for a given job. This happens when Torque
>>    sends a
>>    >>> > sigkill a few seconds after the initial sigterm is sent.
>>    Though I'd
>>    >>> > like to prevent the script from running twice, I haven't had
>>    much
>>    >>> > success. So, I'm now searching for a solution though gold.
>>    >>> >
>>    >>> > Is there a way to have gold ignore duplicate charges for the
>>    same JobId?
>>    >>> >
>>    >>> > --
>>    >>> > Kevin Van Workum, PhD
>>    >>> > Sabalcore Computing Inc.
>>    >>> > Run your code on 500 processors.
>>    >>> > Sign up for a free trial account.
>>    >>> > www.sabalcore.com <http://www.sabalcore.com>
>>    <http://www.sabalcore.com>
>>    >>> > 877-492-8027 ext. 11
>>    >>> >
>>
>>  ------------------------------------------------------------------------
>>    >>> >
>>    >>> > _______________________________________________
>>    >>> > gold-users mailing list
>>    >>> > gold-users at supercluster.org <mailto:gold-users at supercluster.org>
>>
>>    >>> > http://www.supercluster.org/mailman/listinfo/gold-users
>>    >>> >
>>    >>>
>>    >>> _______________________________________________
>>    >>> gold-users mailing list
>>    >>> gold-users at supercluster.org <mailto:gold-users at supercluster.org>
>>
>>    >>> http://www.supercluster.org/mailman/listinfo/gold-users
>>    >>
>>    >>
>>    >>
>>    >> --
>>    >> --
>>    >> Wojciech Turek
>>    >>
>>    >> Assistant System Manager
>>    >>
>>    >> High Performance Computing Service
>>    >> University of Cambridge
>>    >> Email: wjt27 at cam.ac.uk <mailto:wjt27 at cam.ac.uk>
>>
>>    >> Tel: (+)44 1223 763517
>>    >
>>    >
>>    >
>>    > --
>>    > Kevin Van Workum, PhD
>>    > Sabalcore Computing Inc.
>>    > Run your code on 500 processors.
>>    > Sign up for a free trial account.
>>    > www.sabalcore.com <http://www.sabalcore.com>
>>    > 877-492-8027 ext. 11
>>    >
>>
>>
>>
>>    --
>>    Kevin Van Workum, PhD
>>    Sabalcore Computing Inc.
>>    Run your code on 500 processors.
>>    Sign up for a free trial account.
>>    www.sabalcore.com <http://www.sabalcore.com>
>>    877-492-8027 ext. 11
>>
>>
>>
>>
>> --
>> Kevin Van Workum, PhD
>> Sabalcore Computing Inc.
>> Run your code on 500 processors.
>> Sign up for a free trial account.
>> www.sabalcore.com <http://www.sabalcore.com>
>> 877-492-8027 ext. 11
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> gold-users mailing list
>> gold-users at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/gold-users
>>
>>
>
>


-- 
Kevin Van Workum, PhD
Sabalcore Computing Inc.
Run your code on 500 processors.
Sign up for a free trial account.
www.sabalcore.com
877-492-8027 ext. 11
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/gold-users/attachments/20091223/0ac47aa6/attachment.html 


More information about the gold-users mailing list