[Mauiusers] gold + maui: bankfailure

Stijn De Weirdt Stijn.DeWeirdt at ugent.be
Thu Aug 14 08:15:11 MDT 2008


hi all,

after reading some more code, it seems you also need to set the 
(undocumented) parameter
JFACTION=DEFER

stijn

Stijn De Weirdt wrote:
> hi all,
> 
> we are doing some testing wrt to gold and maui.
> (maui snap.1212617145)
> 
> one of teh things we can't get to work is the 'job reservation at job 
> start time' policy. (charging when job is finished works, so i'm not 
> suspecting anythig wrong with gold)
> 
> when there is a bank failure, jobs start to run no matter what we try.
> 
> the maui admin guide states that there is a parameter that can be set 
> DEFERJOBONFAILURE that should deal with this (ie setting to TRUE should 
> keep jobs in state Q). although it is not clear wheter this means any 
> bankfailure or only when the AM can't be reached. but in both cases it 
> doesn't seem to work ;) (logfile extract at the bottom).
> 
> what is even more bizarre, when setting this parameter, maui says 
> (loglevel 9):
> 
> 08/13 16:41:04 INFO:     AMCFG[0] set to DEFERJOBONFAILURE=TRUE
> 08/13 16:41:04 MUGetIndex(DEFERJOBONFAILURE,ValList,0)
> 08/13 16:41:04 WARNING:  AM attribute 'DEFERJOBONFAILURE' not handled
> 
> i grepped the maui code for anything related and found also a 
> BANKDEFERONJOBFAILURE (mind the subtle difference in naming), which has 
> default value of FALSE. so i changed that defautl to TRUE and rebuild 
> maui, but same result, so maybe it's something else.
> 
> hints are welcome.
> 
> many thanks,
> 
> stijn
> 
> 
> 
> logfiles:
> 
> from maui.log with loglevel 9:
> 
> 08/13 17:58:46 ERROR:    cannot connect to allocation-manager server 
> 'head1.x.y.z':7112
> 08/13 17:58:46 MSysRegEvent(RMFAILURE:  cannot connect to 
> allocation-manager server head1.x.y.z:7112 (command: '<XML>')
> ,0,0,1)
> 08/13 17:58:46 MSysLaunchAction(ASList,1)
> 08/13 17:58:46 INFO:     scheduler action 1 disabled
> 08/13 17:58:46 INFO:     command response 'NULL'
> 08/13 17:58:46 ALERT:    no job data available
> 08/13 17:58:46 MSUDisconnect(S)
> 08/13 17:58:46 ALERT:    cannot extract status
> 08/13 17:58:46 ALERT:    cannot reserve allocation for job
> 08/13 17:58:46 WARNING:  cannot reserve allocation for job '121', 
> reason: BankFailure
> 08/13 17:58:46 MRMJobStart(121,Msg,SC)
> 08/13 17:58:46 MPBSJobStart(121,torque,Msg,SC)
> 
> 
> 08/13 15:10:11 WARNING:  request failed
> 08/13 15:10:11 ALERT:    request failed with status code 740 (Project 
> account8 does not exist)
> 08/13 15:10:11 MSUDisconnect(S)
> 08/13 15:10:11 ERROR:    cannot receive response from allocation-manager 
> server 'head1.x.y.z':7112
> 08/13 15:10:11 MSysRegEvent(FAILURE:  cannot receive response from 
> allocation-manager server head1.x.y.z:7112 (cmd: '<XML>')
> ,0,0,1)
> 08/13 15:10:11 MSysLaunchAction(ASList,1)
> 08/13 15:10:11 INFO:     command response 'NULL'
> 08/13 15:10:11 ALERT:    no job data available
> 08/13 15:10:11 ALERT:    cannot extract status
> 08/13 15:10:11 ALERT:    cannot reserve allocation for job
> 08/13 15:10:11 WARNING:  cannot reserve allocation for job '107', 
> reason: BankFailure
> 08/13 15:10:11 MRMJobStart(107,Msg,SC)
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
> 


More information about the mauiusers mailing list