[gold-users] Antwort: Re: Antwort: Re: Maui integration problems - AMCFG-Parameter not working?
RNothdurft at spirit21.de
RNothdurft at spirit21.de
Wed Jul 7 03:51:08 MDT 2010
Hi Scott and all who are watching this topic,
i found the solution for the problem.
As we both supposed it wasn't a problem within GOLD, it's the AMCFG
attribute in maui.cfg.
JOBFAILUREACTION and DEFERJOBONFAILURE are both not functional in Maui
3.3, the parameter i was looking for is undocumented and named JFACTION.
For all who are using Gold with Maui and may get into a situation where
the balance of an account isn't enough for reservation and the job must
not be started, please use JFACTION instead of JOBFAILUREACTION and
DEFERJOBONFAILURE in the following way:
JFACTION=<value> with <value>=NONE(default) || DEFER || CANCEL
For example my new AM configuration in maui.cfg:
AMCFG[bank] JFACTION=DEFER TYPE=GOLD HOST=master PORT=7112
CHARGEPOLICY=DEBITALLWC TIMEOUT=15
I found the decisive pointer in the Maui mailing archive (see
http://supercluster.org/pipermail/mauiusers/2008-August/003479.html) and
checked for the parameter and its possible values in maui source code (see
maui-3.3/src/moab/MConst.c, "MJFActionType").
@ Scott: i think i found some small bugs in GOLD (not effecting correct
functioning) during my evaluation, should they be posted to this mailing
list or is there a different address to send to?
Kind regards,
Richard
gold-users-bounces at supercluster.org schrieb am 06.07.2010 18:29:30:
> Scott Jackson <scottmo at adaptivecomputing.com>
> Gesendet von: gold-users-bounces at supercluster.org
>
> 06.07.2010 18:29
>
> Bitte antworten an
> Gold Users Mailing List <gold-users at supercluster.org>
>
> An
>
> Gold Users Mailing List <gold-users at supercluster.org>
>
> Kopie
>
> Thema
>
> Re: [gold-users] Antwort: Re: Maui integration problems - AMCFG-
> Parameter not working?
>
> Hi Richard,
>
> RNothdurft at spirit21.de wrote:
> > Hi Scott,
> >
> > first of all thank you for your fast reply and advice.
> My pleasure. I was not so fast this time because of the holidays:)
> >
> > gold-users-bounces at supercluster.org schrieb am 02.07.2010 19:27:22:
> > >
> > > Hi Richard,
> > >
> > > RNothdurft at spirit21.de wrote:
> > > > Hi,
> > > >
> > > > i'm evaluating GOLD in a test environment, but there are some
> > problems
> > > > with the maui integration.
> > > >
> > > > I'm getting some errors if an user submits a job without enough
> > > > credits on his GOLD account but the job runs anyway and without
> > > > reservation in GOLD...
> > > So, are you saying that Maui is running the job, even when an
> > > insufficient funds error is returned? If so, this would be an issue
> > with
> > > Maui -- unless you can see a behavior issue in Gold.
> >
> > It may be an issue with maui, but i thought my integration parameters
> > are wrong and this would rather belong to gold... also the maui.log
> > says there is a bank failure:
> > 07/02 11:14:32 MSysRegEvent(FAILURE: cannot receive response from
> > allocation-manager server master:7112 (cmd: '<XML>')
> > ...
> > 07/02 11:14:32 WARNING: cannot reserve allocation for job '49',
> > reason: BankFailure
> If you believe there is a problem with Gold, please provide the extracts
> from the goldd.log that highlight the error. You can scp the entire file
> to guest at adaptivecomputing.com: password guest, if you would like.
> Please review it yourself first, to verify that you do believe there is
> an error in Gold behavior.
>
> >
> > > > If there are enough credits on the account there are no problems,
> > > > reservation and charing of jobs are working.
> > > > I checked the parameter JOBFAILUREACTION but i think the default
> > > > setting is correct. I tried some other values (HOLD,HOLD; RETRY),
> > also
> > > > to change the parameters TYPE,HOST,PORT to SERVER as mentioned
here:
> > > > http://www.clusterresources.com/products/mwm/docs/6.
> > > 4allocationmanagement.shtml#gold
> > > >
> > >
> > > This documentation is for Moab, not Maui. The HOLD,HOLD syntax will
not
> > > work in Maui, neither will the SERVER syntax.
> > >
> > > > but without effect.
> > > > I changed the WIREPROTOCOL-parameter from XML to HTML and also to
> > > > SSS2, just to see some changes in the logfiles, but the shown
> > messages
> > > > are still in XML-format.
> > > >
> > > > So, the questions:
> > > > 1) what's wrong with my configuration?
> > > What do the maui docs say to use here. Is JOBFAILUREACTION the right
> > > parameter to use with Maui (I know this parameter name has changed a
> > few
> > > times over the years).
> >
> > I'm sorry, i didn't consider that there could be differences in
> > AMCFG-Parameter options between Maui and Moab.
> > So i checked for the corresponding Maui-Documentation and found it
here:
> > http://www.clusterresources.com/products/maui/docs/6.
> 4allocationmanagement.shtml#config
> >
> > As you can see there, the SERVER parameter should also work since Maui
> > 3.2.7 and the parameter i'm looking for should be DEFERJOBONFAILURE in
> > Maui. I tried it again with that parameter but also without success.
> > Jobs still start without a reservation in GOLD if the credit value is
> > below the value needed for reservation...
> > Here again the corresponding part of maui.log:
> > 07/05 10:00:03 WARNING: request failed
> > 07/05 10:00:03 ALERT: request failed with status code 782
> > (Insufficient balance to reserve job (JobId 55))
> > 07/05 10:00:03 MSUDisconnect(S)
> > 07/05 10:00:03 ERROR: cannot receive response from
> > allocation-manager server 'master':7112
> > 07/05 10:00:03 MSysRegEvent(FAILURE: cannot receive response from
> > allocation-manager server master:7112 (cmd: '<XML>')
> > ,0,0,1)
> > 07/05 10:00:03 MSysLaunchAction(ASList,1)
> > 07/05 10:00:03 INFO: command response 'NULL'
> > 07/05 10:00:03 ALERT: no job data available
> > 07/05 10:00:03 ALERT: cannot extract status
> > 07/05 10:00:03 ALERT: cannot reserve allocation for job
> > 07/05 10:00:03 WARNING: cannot reserve allocation for job '55',
> > reason: BankFailure
> >
> > My new configuration in maui.cfg:
> > AMCFG[bank] TYPE=GOLD HOST=master PORT=7112
> > SOCKETPROTOCOL=SSS-CHALLENGE WIREPROTOCOL=SSS2 CHARGEPOLICY=DEBITALLWC
> > DEFERJOBONFAILURE=TRUE TIMEOUT=15
> >
> > But if the DEFERJOBONFAILURE parameter is the correct one, why is the
> > --with-gold configure parameter of maui inserting the parameter
> > JOBFAILUREACTION=IGNORE to maui.cfg by default?
> >
> I'm sorry, but I really don't have the cycles to go looking through Maui
> source code. If Maui is still starting a job after receiving a failure
> back from the reservation, then logic tells me this is a problem with
Maui.
>
> >
> > >
> > > > 2) why is the logging still in XML-format?
> > > >
> > > I'm afraid I don't know what you mean by this. Can you show me an
> > > example of the logging format that you think is wrong?
> >
> > I thought that because of the '<XML>' INFO, FAILURE and DEBUG messages
> > below. I thought if i change the WIREPROTOCOL parameter, the logging
> > would show something like "SSS2" or whatever i set it before...
> >
> > maui.log:
> > 07/05 10:00:03 MS3DoCommand(allocation-manager,NULL,OBuf,ODE,SC,EMsg)
> > 07/05 10:00:03 MSUSendData(S,15000000,FALSE,FALSE)
> > 07/05 10:00:03 INFO: packet sent (705 bytes of 705)
> > 07/05 10:00:03 INFO: command sent to server
> > 07/05 10:00:03 INFO: message sent: '<XML>'
> > ...
> > 07/05 10:00:03 MSysRegEvent(FAILURE: cannot receive response from
> > allocation-manager server master:7112 (cmd: '<XML>')
> >
> This is just an unfortunate shortcut in Maui logging, where, instead of
> parsing the XML into a string and displaying it, Maui simply shows the
> literal '<XML>', which has nothing to do with what Gold actual sent
> except as a placeholder for whatever XML was actually sent. The SSS
> Message format is in XML, which is described in the documents under the
> Project Documentation at adaptive.computing.com/gold.
>
> > goldd.log:
> > 2010-07-05 10:00:03.266 TRACE Gold::Response::failure invoked with
> > arguments: (782, Insufficient balance to reserve job (JobId 55))
> > 2010-07-05 10:00:03.267 TRACE Gold::Reply::new invoked with
> > arguments: (connection => IO::Socket::INET=GLOB(0x8656d2c))
> > 2010-07-05 10:00:03.267 TRACE Gold::Reply::sendChunk invoked with
> > arguments: (Gold::Chunk=HASH(0x91fee44))
> > 2010-07-05 10:00:03.268 TRACE Gold::Reply::marshallChunk invoked with
> > arguments: (Gold::Chunk=HASH(0x91fee44))
> > 2010-07-05 10:00:03.268 DEBUG Gold::Reply::sendChunk Writing reply
> > header (HTTP/1.1 200 OK^M
> > Content-Type: text/xml; charset="utf-8"^M
> > Transfer-Encoding: chunked).
> > 2010-07-05 10:00:03.269 INFO Gold::Reply::sendChunk Writing reply
> > payload (232, <?xml version="1.0" encoding="UTF-8"?>
> > <Envelope><Body><Response
> > actor="golduser"><Status><Value>Failure</Value><Code>782</
> Code><Message>Insufficient
> > balance to reserve job (JobId
> > 55)</Message></Status></Response></Body></Envelope>
> >
> OK, so Gold is saying insufficient balance. This is the response XML.
> Please provide the request XML so we can see who and how much is being
> asked for. I would say either something is missing (like Machine,
> Project, ...) or a very long wallclock limit is being requested (100
> days or something -- that results in more than the account has for a
> balance). Or perhaps the accounts are not setup quite the way you think
> they should be. Or maybe reservations are not going away. Or maybe, they
> are just plain out of funds. This will fall out once we see the request
> XML and run a few commands to examine the accounts.
> > Thanks again for any suggestions and your patience with a newbie like
> > me ;)
> >
> > Kind regards,
> > Richard
> >
> Thanks,
>
> Scott
>
> > >
> > > Thanks,
> > >
> > > Scott
> > >
> > > > Thanks for any suggestions.
> > > >
> > > > Kind regards,
> > > > Richard
Richard Nothdurft
DHBW-Student der SPIRIT/21 AG
Otto-Lilienthal-Str. 36, 71034 Böblingen
Mobil: 0177/7427024
E-Mail: rnothdurft at spirit21.de
Internet: http://www.spirit21.de
SPIRIT/21 AG
Sitz der Gesellschaft: Böblingen
Vorstand: Dietmar Wendt (Vorsitzender), Philipp Steffen, Joachim Gutheil
Vorsitzender des Aufsichtsrates: Siegfried J. Althaus
Registergericht: Amtsgericht Stuttgart, Registernummer: HRB 244681
Umsatzsteuer-Identifikationsnummer: DE 198412560
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/gold-users/attachments/20100707/e65397e8/attachment.html
More information about the gold-users
mailing list