[gold-users] Gold performance issues

Brock Palen brockp at umich.edu
Tue Aug 15 08:19:54 MDT 2006


We have gold managing stats for 4 clusters.  Three are running maui  
and one is running moab.  We currently have a large default account  
and we build our statistics from gold.  We plan in the future to use  
gold for enforcing actual allocations.

We have been for the last few weeks been seeing postgress pegging the  
cpu on the system it is running on.  The postgres install is just for  
gold.  The cpu is not peged all the time, but postmaster racks up a  
good few seconds of cpu time for each thread.  I am no postgres  
master but the database is held on a raid.  And io wait is almost non  
existent.  Moab likes to wait on gold and we get messages like:

08/15 09:45:30 ERROR:    cannot receive response from allocation- 
manager server 'cac-admin02.engin.umich.edu':7112
08/15 09:45:30 ALERT:    cannot reserve allocation for job 8990 -  
cannot read message header

Many times in the moab logs.  Is this because of having a single  
large default account that gold is asking postgres to go though all  
the transactions ever done on that default account?   (all jobs for  
the last year)
Some insight or postgres tuning pointers is appreciated.  We hope to  
add many more nodes (and jobs) to the same gold install in the future  
and were disappointed to see it slow down so much.

side note, we have vacuumed the DB a few time over its life.


Brock Palen
Center for Advanced Computing
brockp at umich.edu
(734)936-1985




More information about the gold-users mailing list