[gold-users] moab-gold binding performance issue

Hu, Zongjun zhu at med.miami.edu
Thu Jun 4 14:30:18 MDT 2009


Hi, Scott,

We are running Moab 5.2 with Gold 2.1.6. I did noticed the 'Replace=True' when moab talked back to gold. I will keep watching gold balance report for next few weeks and see how it works. Probably I saw inconsistence because of the old invalid reservation are still in proposed wall clock during and has not been inactivated yet.

Thanks for your advising for the database issue. No, I have not run 'vacummdb' for months since gold is installed. I just did it. I will double check if it is working faster. Maybe I should do a benchmark on gold job transactions to find out the exact time spent on each request.

I have submitted a request to clusterresources to find out the possibility to reconfigure moab-gold binding settings. If it is not possible from Moab configuration, I will try your suggestion to use epilog.

Thanks so much for your help.

Zongjun Hu
---
University of Miami
Center for Computational Science


-----Original Message-----
From: Scott Jackson [mailto:scottmo at clusterresources.com] 
Sent: Thursday, June 04, 2009 3:47 PM
To: Hu, Zongjun
Cc: 'gold-users at supercluster.org'
Subject: Re: [gold-users] moab-gold binding performance issue

Hu, Zongjun wrote:
>
> Hi,
>
> We use Gold-2.1.6 as allocation manager for Moab-5.2.0. We are trying
> to move this configuration to production. We got several issues and
> need help.
>
> 1. This binding is working as job reservation @ job start. Most of the
> time, this mode works perfectly. If a job is accepted to start, a new
> job will be created in gold and account balance will be deducted
> according to requested cpu hours. If this job runs and finishes without
> problem, this job will be charged finally according to real usage. The
> previous charge in gold will be returned to account balance. However,
> we got an interesting problem. We found some jobs are accepted to
> start, and then moved to compute nodes. For some reason, these jobs do
> not start successfully on compute nodes. They are then rejected and
> moved back to blocked/waiting list. After a while, these jobs will be
> evaluated again and repeat these steps. In this situation, multiple
> jobs will be created in gold and account balance will be deducted
> multiple times (to make things worse, user usually request much more
> than they need). When these jobs are finally finished or canceled, gold
> will only change the last job created to 'Charge' stage and put back
> only the last deduction back to account. All previous balance
> jobs/deductions created will stay in gold and the reserved balance
> won't be restored. After a while, gold will have a huge amount of
> 'Reserved balance' even all jobs are completed. Can you give us
> instruction to fix this problem and release all those unnecessary
> reserved balance?
>
I believe that Moab should be releasing these Gold reservations when it 
discovers that they have been rejected (failed to start successfully). 
That would be the correct fix. As it is, the reservations only remain 
active within Gold for the wallclock duration of the job, then they 
automatically become inactive and no longer affect the balance. Anytime 
you need to, you can remove a reservation within gold with the grmres 
command. Also, grmres -I can be used to get rid of all of the stale 
reservations that no longer are affecting the balance. At any given 
time, if you run glsres -A, the list of reservations returned should 
pertain only to currently running jobs. Can you tell me what version of 
Moab you are running? The reason I ask is because I am surprised that 
the multiple reservations are creating multiple jobs within Gold. I 
believe it was quite a long time ago that Moab should have started using 
a new Replace=True option in the reservations to avoid creating new job 
instances in Gold. Also, please tell me the Gold version. As far as the 
reservations not being removed, I would say this is a Moab bug and would 
recommend you submit a ticket to moab-support at clusterresources.com 
explaining the problem and providing what evidence you can collect of it 
(run support.diag.pl and send resulting tarball along with goldd.log, 
any pertinent torque logs, etc).


> 2. Sometimes, we have lot of small jobs (finish in 1 minute). Because
> for each job, Moab has to contact gold server to reserve and then
> charge job when it finishes. Those small jobs make moab repeat these
> steps frequently and moab server is very busy. This slow down reponse
> to user reqeust a lot and sometimes time out in user request. Is there
> a way to speed up gold job processing? If not, can we configure Moab-
> gold binding to Job charge @ job end time? Therefore we can save at
> least half of the processing time. We did not find guidance in Moab or
> Gold documents for this configuration.
>
I don't know if that is configurable within Moab. (I think it probably 
should be but I doubt that it is). I believe that CRI would accept this 
change request and provide a Moab parameter to avoid doing the 
reservation. You would have to submit a ticket to 
moab-support at clusterresources.com to see if Moab can accomodate this 
change request.  It would be possible to take it out of the hands of 
Moab entirely if you wanted to by simply writing your own epilog to call 
gcharge (and taking out the applicable AMCFG parameters out of moab.cfg).

How much time are your reservations and charges taking? If these are 
larger than a fraction of a second (.2 or .3 sec), then yI would ask to 
see if you are VACUUMing your database freqently and see if you have the 
indexes setup in your database. Please let me know about this.

I hope this helps,

Scott



> Thanks.
>
> Zongjun Hu
> -----
> University of Miami
> Center for Computational Science
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> gold-users mailing list
> gold-users at supercluster.org
> http://www.supercluster.org/mailman/listinfo/gold-users
>   



More information about the gold-users mailing list