[gold-users] gcharge issued twice via Torque's epilogue script
Kevin Van Workum
vanw at sabalcore.com
Wed Dec 23 08:16:47 MST 2009
On Tue, Dec 22, 2009 at 6:02 PM, Scott Jackson <
scottmo at adaptivecomputing.com> wrote:
> Kevin,
>
> Sounds good. Unfortunately, that may be about the best you can do until
> Torque gets fixed.
>
> Hmmm.... Actually...
>
> Now that I think about it, there would be one other thing to do. If you did
> something in your epilog that flipped a per-jobid semaphore or something,
> you would only call the charge if the thing had not already been flipped.
> Maybe the presence of a file, or an entry in a database (protected by
> locking the row).
>
> So for example, Begin Work; Insert into Jobs set Jobid=$JobId; Commit. If
> that succeeds, you do the charge. If it fails due to an existing entry, log
> the dup invocation to a log.
>
> I think that is a tad better than refunding after the fact.
Yes, that is a better solution and it seems to work. What I'm doing now it
creating per-job symlink at the beginning of my epilogue script. If the
symlink call fails, then I know that the job had already been charged or is
in the process of being charged. I'm using a symlink because it is an atomic
operation (I think). Or should I use a system call to create the link?
Thanks,
Kevin
>
> Scott
>
>
> Kevin Van Workum wrote:
>
>>
>>
>> On Tue, Dec 22, 2009 at 2:53 PM, Kevin Van Workum <vanw at sabalcore.com<mailto:
>> vanw at sabalcore.com>> wrote:
>>
>> On Tue, Dec 22, 2009 at 2:51 PM, Kevin Van Workum
>> <vanw at sabalcore.com <mailto:vanw at sabalcore.com>> wrote:
>> > On Tue, Dec 22, 2009 at 12:46 PM, Wojciech Turek
>> <wjt27 at cam.ac.uk <mailto:wjt27 at cam.ac.uk>> wrote:
>> >>
>> >> What about glstxn -J <job id> ? You could use this command in
>> your epilogue script to check if charge transaction was made for
>> the particular jobid.
>> >
>> > Yes, I tried that, but surprisingly that doesn't always work. It
>> > appears that gcharge is implemented asynchronously, at least wrt
>> > glstxn. Here's a simplified snippet of my epilogue script (perl) and
>> > glstxn output for a job that still got charged twice.
>> >
>> > #!/usr/bin/perl
>> >
>> > open LG, "glstxn -J $jobid|";
>>
>> Before you mention it, I'm actually using "glstxn --quite -J
>> $jobid|" here.
>>
>> > @buf = <LG>;
>> > close LG;
>> >
>> > if(@buf == 0) {
>> > system("gcharge $args");
>> > } else {
>> > print STDERR "$jobid has already been charged ", @buf+0, "
>> times\n";
>> > }
>> >
>> >
>> > # glstxn -J 230465.jman --show JobId,Id,CreationTime
>> > JobId Id CreationTime
>> > ----------- ------ -------------------
>> > 230465.jman 847102 2009-12-22 14:35:32
>> > 230465.jman 847107 2009-12-22 14:35:32
>> >
>> >
>>
>>
>> FYI, I decided to just run a cronjob every night that searches for
>> duplicated charges and refunds the extra charges.
>>
>> Kevin
>>
>> >>
>> >> Cheers
>> >>
>> >> Wojciech
>> >>
>> >> 2009/12/22 Scott Jackson <scottmo at adaptivecomputing.com
>> <mailto:scottmo at adaptivecomputing.com>>
>>
>> >>>
>> >>> Kevin,
>> >>>
>> >>> No, I'm sorry. There is not. Gold will charge for a job as
>> many times as
>> >>> it is called. There are provisions for incremental charging
>> where it all
>> >>> goes against the same job instance, and if not, it considers them
>> >>> separate jobs with the same jobid. All I can think of is that
>> you could
>> >>> write a wrapper script that looks up the jobid and if it has
>> already
>> >>> been charged that same day, ignores the second charge.
>> >>>
>> >>> I assume you have a ticket open with the Torque support queue
>> on this.
>> >>>
>> >>> Scott
>> >>>
>> >>>
>> >>> Kevin Van Workum wrote:
>> >>> > I use Torque's epilogue script to issue the gcharge command
>> after a
>> >>> > job completes. However, it occasionally happens that the
>> epilogue
>> >>> > script runs twice for a given job. This happens when Torque
>> sends a
>> >>> > sigkill a few seconds after the initial sigterm is sent.
>> Though I'd
>> >>> > like to prevent the script from running twice, I haven't had
>> much
>> >>> > success. So, I'm now searching for a solution though gold.
>> >>> >
>> >>> > Is there a way to have gold ignore duplicate charges for the
>> same JobId?
>> >>> >
>> >>> > --
>> >>> > Kevin Van Workum, PhD
>> >>> > Sabalcore Computing Inc.
>> >>> > Run your code on 500 processors.
>> >>> > Sign up for a free trial account.
>> >>> > www.sabalcore.com <http://www.sabalcore.com>
>> <http://www.sabalcore.com>
>> >>> > 877-492-8027 ext. 11
>> >>> >
>>
>> ------------------------------------------------------------------------
>> >>> >
>> >>> > _______________________________________________
>> >>> > gold-users mailing list
>> >>> > gold-users at supercluster.org <mailto:gold-users at supercluster.org>
>>
>> >>> > http://www.supercluster.org/mailman/listinfo/gold-users
>> >>> >
>> >>>
>> >>> _______________________________________________
>> >>> gold-users mailing list
>> >>> gold-users at supercluster.org <mailto:gold-users at supercluster.org>
>>
>> >>> http://www.supercluster.org/mailman/listinfo/gold-users
>> >>
>> >>
>> >>
>> >> --
>> >> --
>> >> Wojciech Turek
>> >>
>> >> Assistant System Manager
>> >>
>> >> High Performance Computing Service
>> >> University of Cambridge
>> >> Email: wjt27 at cam.ac.uk <mailto:wjt27 at cam.ac.uk>
>>
>> >> Tel: (+)44 1223 763517
>> >
>> >
>> >
>> > --
>> > Kevin Van Workum, PhD
>> > Sabalcore Computing Inc.
>> > Run your code on 500 processors.
>> > Sign up for a free trial account.
>> > www.sabalcore.com <http://www.sabalcore.com>
>> > 877-492-8027 ext. 11
>> >
>>
>>
>>
>> --
>> Kevin Van Workum, PhD
>> Sabalcore Computing Inc.
>> Run your code on 500 processors.
>> Sign up for a free trial account.
>> www.sabalcore.com <http://www.sabalcore.com>
>> 877-492-8027 ext. 11
>>
>>
>>
>>
>> --
>> Kevin Van Workum, PhD
>> Sabalcore Computing Inc.
>> Run your code on 500 processors.
>> Sign up for a free trial account.
>> www.sabalcore.com <http://www.sabalcore.com>
>> 877-492-8027 ext. 11
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> gold-users mailing list
>> gold-users at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/gold-users
>>
>>
>
>
--
Kevin Van Workum, PhD
Sabalcore Computing Inc.
Run your code on 500 processors.
Sign up for a free trial account.
www.sabalcore.com
877-492-8027 ext. 11
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/gold-users/attachments/20091223/0ac47aa6/attachment.html
More information about the gold-users
mailing list