[gold-users] Antwort: Re: Antwort: Re: Maui integration problems - AMCFG-Parameter not working?

RNothdurft at spirit21.de RNothdurft at spirit21.de
Wed Jul 7 03:51:08 MDT 2010


Hi Scott and all who are watching this topic,

i found the solution for the problem.
As we both supposed it wasn't a problem within GOLD, it's the AMCFG 
attribute in maui.cfg.
JOBFAILUREACTION and DEFERJOBONFAILURE are both not functional in Maui 
3.3, the parameter i was looking for is undocumented and named JFACTION.

For all who are using Gold with Maui and may get into a situation where 
the balance of an account isn't enough for reservation and the job must 
not be started, please use JFACTION instead of JOBFAILUREACTION and 
DEFERJOBONFAILURE in the following way:

JFACTION=<value> with <value>=NONE(default) || DEFER || CANCEL

For example my new AM configuration in maui.cfg:
AMCFG[bank] JFACTION=DEFER TYPE=GOLD HOST=master PORT=7112 
CHARGEPOLICY=DEBITALLWC TIMEOUT=15

I found the decisive pointer in the Maui mailing archive (see 
http://supercluster.org/pipermail/mauiusers/2008-August/003479.html) and 
checked for the parameter and its possible values in maui source code (see 
maui-3.3/src/moab/MConst.c, "MJFActionType").

@ Scott: i think i found some small bugs in GOLD (not effecting correct 
functioning) during my evaluation, should they be posted to this mailing 
list or is there a different address to send to?

Kind regards,
Richard


gold-users-bounces at supercluster.org schrieb am 06.07.2010 18:29:30:

> Scott Jackson <scottmo at adaptivecomputing.com> 
> Gesendet von: gold-users-bounces at supercluster.org
> 
> 06.07.2010 18:29
> 
> Bitte antworten an
> Gold Users Mailing List <gold-users at supercluster.org>
> 
> An
> 
> Gold Users Mailing List <gold-users at supercluster.org>
> 
> Kopie
> 
> Thema
> 
> Re: [gold-users] Antwort: Re: Maui integration problems - AMCFG-
> Parameter not working?
> 
> Hi Richard,
> 
> RNothdurft at spirit21.de wrote:
> > Hi Scott,
> >
> > first of all thank you for your fast reply and advice.
> My pleasure. I was not so fast this time because of the holidays:)
> >
> > gold-users-bounces at supercluster.org schrieb am 02.07.2010 19:27:22:
> > >
> > > Hi Richard,
> > >
> > > RNothdurft at spirit21.de wrote:
> > > > Hi,
> > > >
> > > > i'm evaluating GOLD in a test environment, but there are some 
> > problems
> > > > with the maui integration.
> > > >
> > > > I'm getting some errors if an user submits a job without enough
> > > > credits on his GOLD account but the job runs anyway and without
> > > > reservation in GOLD...
> > > So, are you saying that Maui is running the job, even when an
> > > insufficient funds error is returned? If so, this would be an issue 
> > with
> > > Maui -- unless you can see a behavior issue in Gold.
> >
> > It may be an issue with maui, but i thought my integration parameters 
> > are wrong and this would rather belong to gold... also the maui.log 
> > says there is a bank failure:
> > 07/02 11:14:32 MSysRegEvent(FAILURE:  cannot receive response from
> > allocation-manager server master:7112 (cmd: '<XML>')
> > ...
> > 07/02 11:14:32 WARNING:  cannot reserve allocation for job '49',
> > reason: BankFailure
> If you believe there is a problem with Gold, please provide the extracts 

> from the goldd.log that highlight the error. You can scp the entire file 

> to guest at adaptivecomputing.com: password guest, if you would like. 
> Please review it yourself first, to verify that you do believe there is 
> an error in Gold behavior.
> 
> >
> > > > If there are enough credits on the account there are no problems,
> > > > reservation and charing of jobs are working.
> > > > I checked the parameter JOBFAILUREACTION but i think the default
> > > > setting is correct. I tried some other values (HOLD,HOLD; RETRY), 
> > also
> > > > to change the parameters TYPE,HOST,PORT to SERVER as mentioned 
here:
> > > > http://www.clusterresources.com/products/mwm/docs/6.
> > > 4allocationmanagement.shtml#gold
> > > >
> > >
> > > This documentation is for Moab, not Maui. The HOLD,HOLD syntax will 
not
> > > work in Maui, neither will the SERVER syntax.
> > >
> > > > but without effect.
> > > > I changed the WIREPROTOCOL-parameter from XML to HTML and also to
> > > > SSS2, just to see some changes in the logfiles, but the shown 
> > messages
> > > > are still in XML-format.
> > > >
> > > > So, the questions:
> > > > 1) what's wrong with my configuration?
> > > What do the maui docs say to use here. Is JOBFAILUREACTION the right
> > > parameter to use with Maui (I know this parameter name has changed a 

> > few
> > > times over the years).
> >
> > I'm sorry, i didn't consider that there could be differences in 
> > AMCFG-Parameter options between Maui and Moab.
> > So i checked for the corresponding Maui-Documentation and found it 
here:
> > http://www.clusterresources.com/products/maui/docs/6.
> 4allocationmanagement.shtml#config 
> >
> > As you can see there, the SERVER parameter should also work since Maui 

> > 3.2.7 and the parameter i'm looking for should be DEFERJOBONFAILURE in 

> > Maui. I tried it again with that parameter but also without success. 
> > Jobs still start without a reservation in GOLD if the credit value is 
> > below the value needed for reservation...
> > Here again the corresponding part of maui.log:
> > 07/05 10:00:03 WARNING:  request failed
> > 07/05 10:00:03 ALERT:    request failed with status code 782 
> > (Insufficient balance to reserve job (JobId 55))
> > 07/05 10:00:03 MSUDisconnect(S)
> > 07/05 10:00:03 ERROR:    cannot receive response from 
> > allocation-manager server 'master':7112
> > 07/05 10:00:03 MSysRegEvent(FAILURE:  cannot receive response from 
> > allocation-manager server master:7112 (cmd: '<XML>')
> > ,0,0,1)
> > 07/05 10:00:03 MSysLaunchAction(ASList,1)
> > 07/05 10:00:03 INFO:     command response 'NULL'
> > 07/05 10:00:03 ALERT:    no job data available
> > 07/05 10:00:03 ALERT:    cannot extract status
> > 07/05 10:00:03 ALERT:    cannot reserve allocation for job
> > 07/05 10:00:03 WARNING:  cannot reserve allocation for job '55', 
> > reason: BankFailure
> >
> > My new configuration in maui.cfg:
> > AMCFG[bank]  TYPE=GOLD HOST=master PORT=7112 
> > SOCKETPROTOCOL=SSS-CHALLENGE WIREPROTOCOL=SSS2 CHARGEPOLICY=DEBITALLWC 

> > DEFERJOBONFAILURE=TRUE TIMEOUT=15
> >
> > But if the DEFERJOBONFAILURE parameter is the correct one, why is the 
> > --with-gold configure parameter of maui inserting the parameter 
> > JOBFAILUREACTION=IGNORE to maui.cfg by default?
> >
> I'm sorry, but I really don't have the cycles to go looking through Maui 

> source code. If Maui is still starting a job after receiving a failure 
> back from the reservation, then logic tells me this is a problem with 
Maui.
> 
> >
> > >
> > > > 2) why is the logging still in XML-format?
> > > >
> > > I'm afraid I don't know what you mean by this. Can you show me an
> > > example of the logging format that you think is wrong?
> >
> > I thought that because of the '<XML>' INFO, FAILURE and DEBUG messages 

> > below. I thought if i change the WIREPROTOCOL parameter, the logging 
> > would show something like "SSS2" or whatever i set it before...
> >
> > maui.log:
> > 07/05 10:00:03 MS3DoCommand(allocation-manager,NULL,OBuf,ODE,SC,EMsg)
> > 07/05 10:00:03 MSUSendData(S,15000000,FALSE,FALSE)
> > 07/05 10:00:03 INFO:     packet sent (705 bytes of 705)
> > 07/05 10:00:03 INFO:     command sent to server
> > 07/05 10:00:03 INFO:     message sent: '<XML>'
> > ...
> > 07/05 10:00:03 MSysRegEvent(FAILURE:  cannot receive response from 
> > allocation-manager server master:7112 (cmd: '<XML>')
> >
> This is just an unfortunate shortcut in Maui logging, where, instead of 
> parsing the XML into a string and displaying it, Maui simply shows the 
> literal '<XML>', which has nothing to do with what Gold actual sent 
> except as a placeholder for whatever XML was actually sent. The SSS 
> Message format is in XML, which is described in the documents under the 
> Project Documentation at adaptive.computing.com/gold.
> 
> > goldd.log:
> > 2010-07-05 10:00:03.266 TRACE Gold::Response::failure  invoked with 
> > arguments: (782, Insufficient balance to reserve job (JobId 55))
> > 2010-07-05 10:00:03.267 TRACE Gold::Reply::new  invoked with 
> > arguments: (connection => IO::Socket::INET=GLOB(0x8656d2c))
> > 2010-07-05 10:00:03.267 TRACE Gold::Reply::sendChunk  invoked with 
> > arguments: (Gold::Chunk=HASH(0x91fee44))
> > 2010-07-05 10:00:03.268 TRACE Gold::Reply::marshallChunk  invoked with 

> > arguments: (Gold::Chunk=HASH(0x91fee44))
> > 2010-07-05 10:00:03.268 DEBUG Gold::Reply::sendChunk  Writing reply 
> > header (HTTP/1.1 200 OK^M
> > Content-Type: text/xml; charset="utf-8"^M
> > Transfer-Encoding: chunked).
> > 2010-07-05 10:00:03.269 INFO  Gold::Reply::sendChunk  Writing reply 
> > payload (232, <?xml version="1.0" encoding="UTF-8"?>
> > <Envelope><Body><Response 
> > actor="golduser"><Status><Value>Failure</Value><Code>782</
> Code><Message>Insufficient 
> > balance to reserve job (JobId 
> > 55)</Message></Status></Response></Body></Envelope>
> >
> OK, so Gold is saying insufficient balance. This is the response XML. 
> Please provide the request XML so we can see who and how much is being 
> asked for. I would say either something is missing (like Machine, 
> Project, ...) or a very long wallclock limit is being requested (100 
> days or something -- that results in more than the account has for a 
> balance). Or perhaps the accounts are not setup quite the way you think 
> they should be. Or maybe reservations are not going away. Or maybe, they 

> are just plain out of funds. This will fall out once we see the request 
> XML and run a few commands to examine the accounts.
> > Thanks again for any suggestions and your patience with a newbie like 
> > me ;)
> >
> > Kind regards,
> > Richard
> >
> Thanks,
> 
> Scott
> 
> > >
> > > Thanks,
> > >
> > > Scott
> > >
> > > > Thanks for any suggestions.
> > > >
> > > > Kind regards,
> > > > Richard


Richard Nothdurft
DHBW-Student der SPIRIT/21 AG
Otto-Lilienthal-Str. 36, 71034 Böblingen
Mobil: 0177/7427024
E-Mail: rnothdurft at spirit21.de
Internet: http://www.spirit21.de



SPIRIT/21 AG

Sitz der Gesellschaft: Böblingen
Vorstand: Dietmar Wendt (Vorsitzender), Philipp Steffen, Joachim Gutheil
Vorsitzender des Aufsichtsrates: Siegfried J. Althaus
Registergericht: Amtsgericht Stuttgart, Registernummer: HRB 244681
Umsatzsteuer-Identifikationsnummer: DE 198412560
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/gold-users/attachments/20100707/e65397e8/attachment.html 


More information about the gold-users mailing list