[gold-users] Is it necessary to do both gquote and greserve at job submit time when integrating resource manage system?

Scott Jackson scottmo at adaptivecomputing.com
Wed Mar 23 10:44:34 MDT 2011


I do not believe there is a perfect answer to this. There are non-optimal effects either way you do it. I think Michael has shown a very rational and common approach to this. I agree that his (and the recommended) way to do this leaves open the possibility of the job not having enough funds when it is time to actually run the job. But at least it will just place the job on hold and not get it started with insufficient funds.

The problem with doing the reservation at job submission time is that it immediately impinges against the user's balance in a fairly dramatic way, well before the job even starts. So, let's say that a user's average wallclock limit is historically about 20% of his requested wallclock limit (which is typical). Let's say he has 1000 credits and his jobs will run for 200 seconds on 1 proc each for a charge of 200 each, but his wallclock limit he puts is 5 times his actual run time so the reservation will be for 1000. Further aggravating the problem is that perhaps there is a job backlog so it will take half a day for his job to run. So, now, he submits one job, gets the reservation for 2000 and the job is queued. He then tries to submit the next 4 jobs, but they are rejected. It will not be until about half a day before the queueing system performs his actual charge returning the 800 credits back to availabilty. He was severely blocked because we tried so hard to absolutely prevent any possible lack of funds situation. If we had just done a quote and balance check, the user instead would have received a warning, but all 5 jobs could have been submitted and been queued and could have actually run -- one at a time. Maybe a few of them could have gotten deferred and then automatically retried later. Which is better? Whichever the site prefers.

One final thought. If you want to do the reservation at submit time, you should try to set a reasonable expiration time for the reservation since it is not going to start right away and we really don't know when it will start.

Scott


----- Original Message -----
> From: "Michael Sternberg" <sternberg at anl.gov>
> To: "Gold Users Mailing List" <gold-users at supercluster.org>, "Wei Lin" <weilin at platform.com>
> Sent: Monday, March 21, 2011 9:56:16 PM
> Subject: Re: [gold-users] Is it necessary to do both gquote and greserve at job submit time when integrating resource
> manage system?
> I use Torque and deployed a job submission filter that simply checks
> the balance. If the job requests more, the submission is rejected by
> this submit filter (communicated to qsub via exit code). Such a job
> never reaches the resource manager queue. If, however, the submitted
> job merely amounts to a certain percentage of the available balance,
> the script warns the user of the low balance but succeeds. The user
> can then arrange for a new allocation. Works very well here in
> practice.
> 
> Once ingested by the RM, each job will be passed to the scheduler
> which will do its own Gold interaction. This will process jobs
> sequentially, requesting a quote and then a reservation. This means
> job 2 in your example will fail to get past the quote stage and would
> be marked ineligible for execution (blocked/held), rather than fail.
> 
> 
> Michael.
> 
> 
> On Mar 21, 2011, at 22:32, "Wei Lin" <weilin at platform.com> wrote:
> 
> > The proposed integration will do a job quote at job submission time,
> > a job reservation at job start time, and a job charge when the job
> > completes.
> > After reading the Gold User Guide, I think this would leave open the
> > possibility of a user submitting more jobs than he/she can afford,
> > and then having the job fail at job start time.
> >
> > My scenario.
> >
> > A user has 1000 credits (whatever they are called). The user submits
> > 2 jobs immediately one after the other. Each job gets a quote saying
> > they will spend 600 credits. The next scheduling cycle LSF
> > dispatches the two job. The 2nd one will fail because there aren't
> > enough credits in the bank.
> >
> > I am just going by what the Gold user guide says which in places
> > isn't a lot. I don't think obtaining a quote changes one's balance,
> > but a reservation does. If I'm right getting a quote and reserving
> > may need to be done together at submission time.
> _______________________________________________
> gold-users mailing list
> gold-users at supercluster.org
> http://www.supercluster.org/mailman/listinfo/gold-users


More information about the gold-users mailing list