[gold-users] Antwort: Re: Antwort: Re: Maui integration problems - AMCFG-Parameter not working?

Scott Jackson scottmo at adaptivecomputing.com
Wed Jul 7 09:50:08 MDT 2010


Hi Richard,

I'm happy you were able to get that problem resolved. Thank you for the 
update. I should now be able to pass this along to anybody else who runs 
into this problem.

As far as the Gold bugs, you can submit them to 
gold-support at adaptivecomputing.com. That way they will have a ticket 
backing and tracking them. Alternatively, if you would rather, you could 
post them here or email me directly, but the gold-support mechanism is 
preferred if it is not a hassle for you.

Thanks,

Scott


RNothdurft at spirit21.de wrote:
> Hi Scott and all who are watching this topic,
>
> i found the solution for the problem.
> As we both supposed it wasn't a problem within GOLD, it's the AMCFG 
> attribute in maui.cfg.
> JOBFAILUREACTION and DEFERJOBONFAILURE are both not functional in Maui 
> 3.3, the parameter i was looking for is undocumented and named JFACTION.
>
> For all who are using Gold with Maui and may get into a situation 
> where the balance of an account isn't enough for reservation and the 
> job must not be started, please use JFACTION instead of 
> JOBFAILUREACTION and DEFERJOBONFAILURE in the following way:
>
> JFACTION=<value> with <value>=NONE(default) || DEFER || CANCEL
>
> For example my new AM configuration in maui.cfg:
> AMCFG[bank] JFACTION=DEFER TYPE=GOLD HOST=master PORT=7112 
> CHARGEPOLICY=DEBITALLWC TIMEOUT=15
>
> I found the decisive pointer in the Maui mailing archive (see 
> http://supercluster.org/pipermail/mauiusers/2008-August/003479.html) 
> and checked for the parameter and its possible values in maui source 
> code (see maui-3.3/src/moab/MConst.c, "MJFActionType").
>
> @ Scott: i think i found some small bugs in GOLD (not effecting 
> correct functioning) during my evaluation, should they be posted to 
> this mailing list or is there a different address to send to?
>
> Kind regards,
> Richard
>
>
> gold-users-bounces at supercluster.org schrieb am 06.07.2010 18:29:30:
>
> > Scott Jackson <scottmo at adaptivecomputing.com>
> > Gesendet von: gold-users-bounces at supercluster.org
> >
> > 06.07.2010 18:29
> >
> > Bitte antworten an
> > Gold Users Mailing List <gold-users at supercluster.org>
> >
> > An
> >
> > Gold Users Mailing List <gold-users at supercluster.org>
> >
> > Kopie
> >
> > Thema
> >
> > Re: [gold-users] Antwort: Re: Maui integration problems - AMCFG-
> > Parameter not working?
> >
> > Hi Richard,
> >
> > RNothdurft at spirit21.de wrote:
> > > Hi Scott,
> > >
> > > first of all thank you for your fast reply and advice.
> > My pleasure. I was not so fast this time because of the holidays:)
> > >
> > > gold-users-bounces at supercluster.org schrieb am 02.07.2010 19:27:22:
> > > >
> > > > Hi Richard,
> > > >
> > > > RNothdurft at spirit21.de wrote:
> > > > > Hi,
> > > > >
> > > > > i'm evaluating GOLD in a test environment, but there are some
> > > problems
> > > > > with the maui integration.
> > > > >
> > > > > I'm getting some errors if an user submits a job without enough
> > > > > credits on his GOLD account but the job runs anyway and without
> > > > > reservation in GOLD...
> > > > So, are you saying that Maui is running the job, even when an
> > > > insufficient funds error is returned? If so, this would be an issue
> > > with
> > > > Maui -- unless you can see a behavior issue in Gold.
> > >
> > > It may be an issue with maui, but i thought my integration parameters
> > > are wrong and this would rather belong to gold... also the maui.log
> > > says there is a bank failure:
> > > 07/02 11:14:32 MSysRegEvent(FAILURE:  cannot receive response from
> > > allocation-manager server master:7112 (cmd: '<XML>')
> > > ...
> > > 07/02 11:14:32 WARNING:  cannot reserve allocation for job '49',
> > > reason: BankFailure
> > If you believe there is a problem with Gold, please provide the 
> extracts
> > from the goldd.log that highlight the error. You can scp the entire 
> file
> > to guest at adaptivecomputing.com: password guest, if you would like.
> > Please review it yourself first, to verify that you do believe there is
> > an error in Gold behavior.
> >
> > >
> > > > > If there are enough credits on the account there are no problems,
> > > > > reservation and charing of jobs are working.
> > > > > I checked the parameter JOBFAILUREACTION but i think the default
> > > > > setting is correct. I tried some other values (HOLD,HOLD; RETRY),
> > > also
> > > > > to change the parameters TYPE,HOST,PORT to SERVER as mentioned 
> here:
> > > > > http://www.clusterresources.com/products/mwm/docs/6.
> > > > 4allocationmanagement.shtml#gold
> > > > >
> > > >
> > > > This documentation is for Moab, not Maui. The HOLD,HOLD syntax 
> will not
> > > > work in Maui, neither will the SERVER syntax.
> > > >
> > > > > but without effect.
> > > > > I changed the WIREPROTOCOL-parameter from XML to HTML and also to
> > > > > SSS2, just to see some changes in the logfiles, but the shown
> > > messages
> > > > > are still in XML-format.
> > > > >
> > > > > So, the questions:
> > > > > 1) what's wrong with my configuration?
> > > > What do the maui docs say to use here. Is JOBFAILUREACTION the right
> > > > parameter to use with Maui (I know this parameter name has 
> changed a
> > > few
> > > > times over the years).
> > >
> > > I'm sorry, i didn't consider that there could be differences in
> > > AMCFG-Parameter options between Maui and Moab.
> > > So i checked for the corresponding Maui-Documentation and found it 
> here:
> > > http://www.clusterresources.com/products/maui/docs/6.
> > 4allocationmanagement.shtml#config
> > >
> > > As you can see there, the SERVER parameter should also work since 
> Maui
> > > 3.2.7 and the parameter i'm looking for should be 
> DEFERJOBONFAILURE in
> > > Maui. I tried it again with that parameter but also without success.
> > > Jobs still start without a reservation in GOLD if the credit value is
> > > below the value needed for reservation...
> > > Here again the corresponding part of maui.log:
> > > 07/05 10:00:03 WARNING:  request failed
> > > 07/05 10:00:03 ALERT:    request failed with status code 782
> > > (Insufficient balance to reserve job (JobId 55))
> > > 07/05 10:00:03 MSUDisconnect(S)
> > > 07/05 10:00:03 ERROR:    cannot receive response from
> > > allocation-manager server 'master':7112
> > > 07/05 10:00:03 MSysRegEvent(FAILURE:  cannot receive response from
> > > allocation-manager server master:7112 (cmd: '<XML>')
> > > ,0,0,1)
> > > 07/05 10:00:03 MSysLaunchAction(ASList,1)
> > > 07/05 10:00:03 INFO:     command response 'NULL'
> > > 07/05 10:00:03 ALERT:    no job data available
> > > 07/05 10:00:03 ALERT:    cannot extract status
> > > 07/05 10:00:03 ALERT:    cannot reserve allocation for job
> > > 07/05 10:00:03 WARNING:  cannot reserve allocation for job '55',
> > > reason: BankFailure
> > >
> > > My new configuration in maui.cfg:
> > > AMCFG[bank]  TYPE=GOLD HOST=master PORT=7112
> > > SOCKETPROTOCOL=SSS-CHALLENGE WIREPROTOCOL=SSS2 
> CHARGEPOLICY=DEBITALLWC
> > > DEFERJOBONFAILURE=TRUE TIMEOUT=15
> > >
> > > But if the DEFERJOBONFAILURE parameter is the correct one, why is the
> > > --with-gold configure parameter of maui inserting the parameter
> > > JOBFAILUREACTION=IGNORE to maui.cfg by default?
> > >
> > I'm sorry, but I really don't have the cycles to go looking through 
> Maui
> > source code. If Maui is still starting a job after receiving a failure
> > back from the reservation, then logic tells me this is a problem 
> with Maui.
> >
> > >
> > > >
> > > > > 2) why is the logging still in XML-format?
> > > > >
> > > > I'm afraid I don't know what you mean by this. Can you show me an
> > > > example of the logging format that you think is wrong?
> > >
> > > I thought that because of the '<XML>' INFO, FAILURE and DEBUG 
> messages
> > > below. I thought if i change the WIREPROTOCOL parameter, the logging
> > > would show something like "SSS2" or whatever i set it before...
> > >
> > > maui.log:
> > > 07/05 10:00:03 MS3DoCommand(allocation-manager,NULL,OBuf,ODE,SC,EMsg)
> > > 07/05 10:00:03 MSUSendData(S,15000000,FALSE,FALSE)
> > > 07/05 10:00:03 INFO:     packet sent (705 bytes of 705)
> > > 07/05 10:00:03 INFO:     command sent to server
> > > 07/05 10:00:03 INFO:     message sent: '<XML>'
> > > ...
> > > 07/05 10:00:03 MSysRegEvent(FAILURE:  cannot receive response from
> > > allocation-manager server master:7112 (cmd: '<XML>')
> > >
> > This is just an unfortunate shortcut in Maui logging, where, instead of
> > parsing the XML into a string and displaying it, Maui simply shows the
> > literal '<XML>', which has nothing to do with what Gold actual sent
> > except as a placeholder for whatever XML was actually sent. The SSS
> > Message format is in XML, which is described in the documents under the
> > Project Documentation at adaptive.computing.com/gold.
> >
> > > goldd.log:
> > > 2010-07-05 10:00:03.266 TRACE Gold::Response::failure  invoked with
> > > arguments: (782, Insufficient balance to reserve job (JobId 55))
> > > 2010-07-05 10:00:03.267 TRACE Gold::Reply::new  invoked with
> > > arguments: (connection => IO::Socket::INET=GLOB(0x8656d2c))
> > > 2010-07-05 10:00:03.267 TRACE Gold::Reply::sendChunk  invoked with
> > > arguments: (Gold::Chunk=HASH(0x91fee44))
> > > 2010-07-05 10:00:03.268 TRACE Gold::Reply::marshallChunk  invoked 
> with
> > > arguments: (Gold::Chunk=HASH(0x91fee44))
> > > 2010-07-05 10:00:03.268 DEBUG Gold::Reply::sendChunk  Writing reply
> > > header (HTTP/1.1 200 OK^M
> > > Content-Type: text/xml; charset="utf-8"^M
> > > Transfer-Encoding: chunked).
> > > 2010-07-05 10:00:03.269 INFO  Gold::Reply::sendChunk  Writing reply
> > > payload (232, <?xml version="1.0" encoding="UTF-8"?>
> > > <Envelope><Body><Response
> > > actor="golduser"><Status><Value>Failure</Value><Code>782</
> > Code><Message>Insufficient
> > > balance to reserve job (JobId
> > > 55)</Message></Status></Response></Body></Envelope>
> > >
> > OK, so Gold is saying insufficient balance. This is the response XML.
> > Please provide the request XML so we can see who and how much is being
> > asked for. I would say either something is missing (like Machine,
> > Project, ...) or a very long wallclock limit is being requested (100
> > days or something -- that results in more than the account has for a
> > balance). Or perhaps the accounts are not setup quite the way you think
> > they should be. Or maybe reservations are not going away. Or maybe, 
> they
> > are just plain out of funds. This will fall out once we see the request
> > XML and run a few commands to examine the accounts.
> > > Thanks again for any suggestions and your patience with a newbie like
> > > me ;)
> > >
> > > Kind regards,
> > > Richard
> > >
> > Thanks,
> >
> > Scott
> >
> > > >
> > > > Thanks,
> > > >
> > > > Scott
> > > >
> > > > > Thanks for any suggestions.
> > > > >
> > > > > Kind regards,
> > > > > Richard
>
>
> Richard Nothdurft
> DHBW-Student der SPIRIT/21 AG
> Otto-Lilienthal-Str. 36, 71034 Böblingen
> Mobil: 0177/7427024
> E-Mail: rnothdurft at spirit21.de
> Internet: http://www.spirit21.de <http://www.spirit21.de/>
>
>
>
> SPIRIT/21 AG
>
> Sitz der Gesellschaft: Böblingen
> Vorstand: Dietmar Wendt (Vorsitzender), Philipp Steffen, Joachim Gutheil
> Vorsitzender des Aufsichtsrates: Siegfried J. Althaus
> Registergericht: Amtsgericht Stuttgart, Registernummer: HRB 244681
> Umsatzsteuer-Identifikationsnummer: DE 198412560
> ------------------------------------------------------------------------
>
> _______________________________________________
> gold-users mailing list
> gold-users at supercluster.org
> http://www.supercluster.org/mailman/listinfo/gold-users
>   


More information about the gold-users mailing list