[Mauiusers] gold + maui: bankfailure
Stijn De Weirdt
Stijn.DeWeirdt at ugent.be
Thu Aug 14 08:15:11 MDT 2008
hi all,
after reading some more code, it seems you also need to set the
(undocumented) parameter
JFACTION=DEFER
stijn
Stijn De Weirdt wrote:
> hi all,
>
> we are doing some testing wrt to gold and maui.
> (maui snap.1212617145)
>
> one of teh things we can't get to work is the 'job reservation at job
> start time' policy. (charging when job is finished works, so i'm not
> suspecting anythig wrong with gold)
>
> when there is a bank failure, jobs start to run no matter what we try.
>
> the maui admin guide states that there is a parameter that can be set
> DEFERJOBONFAILURE that should deal with this (ie setting to TRUE should
> keep jobs in state Q). although it is not clear wheter this means any
> bankfailure or only when the AM can't be reached. but in both cases it
> doesn't seem to work ;) (logfile extract at the bottom).
>
> what is even more bizarre, when setting this parameter, maui says
> (loglevel 9):
>
> 08/13 16:41:04 INFO: AMCFG[0] set to DEFERJOBONFAILURE=TRUE
> 08/13 16:41:04 MUGetIndex(DEFERJOBONFAILURE,ValList,0)
> 08/13 16:41:04 WARNING: AM attribute 'DEFERJOBONFAILURE' not handled
>
> i grepped the maui code for anything related and found also a
> BANKDEFERONJOBFAILURE (mind the subtle difference in naming), which has
> default value of FALSE. so i changed that defautl to TRUE and rebuild
> maui, but same result, so maybe it's something else.
>
> hints are welcome.
>
> many thanks,
>
> stijn
>
>
>
> logfiles:
>
> from maui.log with loglevel 9:
>
> 08/13 17:58:46 ERROR: cannot connect to allocation-manager server
> 'head1.x.y.z':7112
> 08/13 17:58:46 MSysRegEvent(RMFAILURE: cannot connect to
> allocation-manager server head1.x.y.z:7112 (command: '<XML>')
> ,0,0,1)
> 08/13 17:58:46 MSysLaunchAction(ASList,1)
> 08/13 17:58:46 INFO: scheduler action 1 disabled
> 08/13 17:58:46 INFO: command response 'NULL'
> 08/13 17:58:46 ALERT: no job data available
> 08/13 17:58:46 MSUDisconnect(S)
> 08/13 17:58:46 ALERT: cannot extract status
> 08/13 17:58:46 ALERT: cannot reserve allocation for job
> 08/13 17:58:46 WARNING: cannot reserve allocation for job '121',
> reason: BankFailure
> 08/13 17:58:46 MRMJobStart(121,Msg,SC)
> 08/13 17:58:46 MPBSJobStart(121,torque,Msg,SC)
>
>
> 08/13 15:10:11 WARNING: request failed
> 08/13 15:10:11 ALERT: request failed with status code 740 (Project
> account8 does not exist)
> 08/13 15:10:11 MSUDisconnect(S)
> 08/13 15:10:11 ERROR: cannot receive response from allocation-manager
> server 'head1.x.y.z':7112
> 08/13 15:10:11 MSysRegEvent(FAILURE: cannot receive response from
> allocation-manager server head1.x.y.z:7112 (cmd: '<XML>')
> ,0,0,1)
> 08/13 15:10:11 MSysLaunchAction(ASList,1)
> 08/13 15:10:11 INFO: command response 'NULL'
> 08/13 15:10:11 ALERT: no job data available
> 08/13 15:10:11 ALERT: cannot extract status
> 08/13 15:10:11 ALERT: cannot reserve allocation for job
> 08/13 15:10:11 WARNING: cannot reserve allocation for job '107',
> reason: BankFailure
> 08/13 15:10:11 MRMJobStart(107,Msg,SC)
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>
More information about the mauiusers
mailing list