[gold-users] Antwort: Re: Maui integration problems - AMCFG-Parameter not working?

Scott Jackson scottmo at adaptivecomputing.com
Tue Jul 6 10:29:30 MDT 2010


Hi Richard,

RNothdurft at spirit21.de wrote:
> Hi Scott,
>
> first of all thank you for your fast reply and advice.
My pleasure. I was not so fast this time because of the holidays:)
>
> gold-users-bounces at supercluster.org schrieb am 02.07.2010 19:27:22:
> >
> > Hi Richard,
> >
> > RNothdurft at spirit21.de wrote:
> > > Hi,
> > >
> > > i'm evaluating GOLD in a test environment, but there are some 
> problems
> > > with the maui integration.
> > >
> > > I'm getting some errors if an user submits a job without enough
> > > credits on his GOLD account but the job runs anyway and without
> > > reservation in GOLD...
> > So, are you saying that Maui is running the job, even when an
> > insufficient funds error is returned? If so, this would be an issue 
> with
> > Maui -- unless you can see a behavior issue in Gold.
>
> It may be an issue with maui, but i thought my integration parameters 
> are wrong and this would rather belong to gold... also the maui.log 
> says there is a bank failure:
> 07/02 11:14:32 MSysRegEvent(FAILURE:  cannot receive response from
> allocation-manager server master:7112 (cmd: '<XML>')
> ...
> 07/02 11:14:32 WARNING:  cannot reserve allocation for job '49',
> reason: BankFailure
If you believe there is a problem with Gold, please provide the extracts 
from the goldd.log that highlight the error. You can scp the entire file 
to guest at adaptivecomputing.com: password guest, if you would like. 
Please review it yourself first, to verify that you do believe there is 
an error in Gold behavior.

>
> > > If there are enough credits on the account there are no problems,
> > > reservation and charing of jobs are working.
> > > I checked the parameter JOBFAILUREACTION but i think the default
> > > setting is correct. I tried some other values (HOLD,HOLD; RETRY), 
> also
> > > to change the parameters TYPE,HOST,PORT to SERVER as mentioned here:
> > > http://www.clusterresources.com/products/mwm/docs/6.
> > 4allocationmanagement.shtml#gold
> > >
> >
> > This documentation is for Moab, not Maui. The HOLD,HOLD syntax will not
> > work in Maui, neither will the SERVER syntax.
> >
> > > but without effect.
> > > I changed the WIREPROTOCOL-parameter from XML to HTML and also to
> > > SSS2, just to see some changes in the logfiles, but the shown 
> messages
> > > are still in XML-format.
> > >
> > > So, the questions:
> > > 1) what's wrong with my configuration?
> > What do the maui docs say to use here. Is JOBFAILUREACTION the right
> > parameter to use with Maui (I know this parameter name has changed a 
> few
> > times over the years).
>
> I'm sorry, i didn't consider that there could be differences in 
> AMCFG-Parameter options between Maui and Moab.
> So i checked for the corresponding Maui-Documentation and found it here:
> http://www.clusterresources.com/products/maui/docs/6.4allocationmanagement.shtml#config 
>
> As you can see there, the SERVER parameter should also work since Maui 
> 3.2.7 and the parameter i'm looking for should be DEFERJOBONFAILURE in 
> Maui. I tried it again with that parameter but also without success. 
> Jobs still start without a reservation in GOLD if the credit value is 
> below the value needed for reservation...
> Here again the corresponding part of maui.log:
> 07/05 10:00:03 WARNING:  request failed
> 07/05 10:00:03 ALERT:    request failed with status code 782 
> (Insufficient balance to reserve job (JobId 55))
> 07/05 10:00:03 MSUDisconnect(S)
> 07/05 10:00:03 ERROR:    cannot receive response from 
> allocation-manager server 'master':7112
> 07/05 10:00:03 MSysRegEvent(FAILURE:  cannot receive response from 
> allocation-manager server master:7112 (cmd: '<XML>')
> ,0,0,1)
> 07/05 10:00:03 MSysLaunchAction(ASList,1)
> 07/05 10:00:03 INFO:     command response 'NULL'
> 07/05 10:00:03 ALERT:    no job data available
> 07/05 10:00:03 ALERT:    cannot extract status
> 07/05 10:00:03 ALERT:    cannot reserve allocation for job
> 07/05 10:00:03 WARNING:  cannot reserve allocation for job '55', 
> reason: BankFailure
>
> My new configuration in maui.cfg:
> AMCFG[bank]  TYPE=GOLD HOST=master PORT=7112 
> SOCKETPROTOCOL=SSS-CHALLENGE WIREPROTOCOL=SSS2 CHARGEPOLICY=DEBITALLWC 
> DEFERJOBONFAILURE=TRUE TIMEOUT=15
>
> But if the DEFERJOBONFAILURE parameter is the correct one, why is the 
> --with-gold configure parameter of maui inserting the parameter 
> JOBFAILUREACTION=IGNORE to maui.cfg by default?
>
I'm sorry, but I really don't have the cycles to go looking through Maui 
source code. If Maui is still starting a job after receiving a failure 
back from the reservation, then logic tells me this is a problem with Maui.

>
> >
> > > 2) why is the logging still in XML-format?
> > >
> > I'm afraid I don't know what you mean by this. Can you show me an
> > example of the logging format that you think is wrong?
>
> I thought that because of the '<XML>' INFO, FAILURE and DEBUG messages 
> below. I thought if i change the WIREPROTOCOL parameter, the logging 
> would show something like "SSS2" or whatever i set it before...
>
> maui.log:
> 07/05 10:00:03 MS3DoCommand(allocation-manager,NULL,OBuf,ODE,SC,EMsg)
> 07/05 10:00:03 MSUSendData(S,15000000,FALSE,FALSE)
> 07/05 10:00:03 INFO:     packet sent (705 bytes of 705)
> 07/05 10:00:03 INFO:     command sent to server
> 07/05 10:00:03 INFO:     message sent: '<XML>'
> ...
> 07/05 10:00:03 MSysRegEvent(FAILURE:  cannot receive response from 
> allocation-manager server master:7112 (cmd: '<XML>')
>
This is just an unfortunate shortcut in Maui logging, where, instead of 
parsing the XML into a string and displaying it, Maui simply shows the 
literal '<XML>', which has nothing to do with what Gold actual sent 
except as a placeholder for whatever XML was actually sent. The SSS 
Message format is in XML, which is described in the documents under the 
Project Documentation at adaptive.computing.com/gold.

> goldd.log:
> 2010-07-05 10:00:03.266 TRACE Gold::Response::failure  invoked with 
> arguments: (782, Insufficient balance to reserve job (JobId 55))
> 2010-07-05 10:00:03.267 TRACE Gold::Reply::new  invoked with 
> arguments: (connection => IO::Socket::INET=GLOB(0x8656d2c))
> 2010-07-05 10:00:03.267 TRACE Gold::Reply::sendChunk  invoked with 
> arguments: (Gold::Chunk=HASH(0x91fee44))
> 2010-07-05 10:00:03.268 TRACE Gold::Reply::marshallChunk  invoked with 
> arguments: (Gold::Chunk=HASH(0x91fee44))
> 2010-07-05 10:00:03.268 DEBUG Gold::Reply::sendChunk  Writing reply 
> header (HTTP/1.1 200 OK^M
> Content-Type: text/xml; charset="utf-8"^M
> Transfer-Encoding: chunked).
> 2010-07-05 10:00:03.269 INFO  Gold::Reply::sendChunk  Writing reply 
> payload (232, <?xml version="1.0" encoding="UTF-8"?>
> <Envelope><Body><Response 
> actor="golduser"><Status><Value>Failure</Value><Code>782</Code><Message>Insufficient 
> balance to reserve job (JobId 
> 55)</Message></Status></Response></Body></Envelope>
>
OK, so Gold is saying insufficient balance. This is the response XML. 
Please provide the request XML so we can see who and how much is being 
asked for. I would say either something is missing (like Machine, 
Project, ...) or a very long wallclock limit is being requested (100 
days or something -- that results in more than the account has for a 
balance). Or perhaps the accounts are not setup quite the way you think 
they should be. Or maybe reservations are not going away. Or maybe, they 
are just plain out of funds. This will fall out once we see the request 
XML and run a few commands to examine the accounts.
> Thanks again for any suggestions and your patience with a newbie like 
> me ;)
>
> Kind regards,
> Richard
>
Thanks,

Scott

> >
> > Thanks,
> >
> > Scott
> >
> > > Thanks for any suggestions.
> > >
> > > Kind regards,
> > > Richard
> > >
>
>
> Richard Nothdurft
> DHBW-Student der SPIRIT/21 AG
> Otto-Lilienthal-Str. 36, 71034 Böblingen
> Mobil: 0177/7427024
> E-Mail: rnothdurft at spirit21.de
> Internet: http://www.spirit21.de <http://www.spirit21.de/>
>
>
> SPIRIT/21 AG
>
> Sitz der Gesellschaft: Böblingen
> Vorstand: Dietmar Wendt (Vorsitzender), Philipp Steffen, Joachim Gutheil
> Vorsitzender des Aufsichtsrates: Siegfried J. Althaus
> Registergericht: Amtsgericht Stuttgart, Registernummer: HRB 244681
> Umsatzsteuer-Identifikationsnummer: DE 198412560
> ------------------------------------------------------------------------
>
> _______________________________________________
> gold-users mailing list
> gold-users at supercluster.org
> http://www.supercluster.org/mailman/listinfo/gold-users
>   


More information about the gold-users mailing list