[gold-users] maui crash after Successfully charged job

Eva Hocks hocks at sdsc.edu
Tue Oct 8 17:50:17 MDT 2013





I see the same error may times in the goldd.logs:

goldd.log.7:2013-10-08 15:28:27.433 TRACE Gold::Response::failure  invoked with
arguments: (315, The Bank class does not implement ReservationAllocation Delete)


The Action Query shows:
ReservationAllocation Create   False   Create
ReservationAllocation Query    True    Query
ReservationAllocation Modify   False   Modify
ReservationAllocation Delete   False   Delete
ReservationAllocation Undelete False   Undelete



Could the failure be fixed with setting ReservationAllocation Delete to True?

Thanks
Eva


On Tue, 8 Oct 2013, Eva Hocks wrote:

>
>
> maui 3.3.1, torque 4.2.5 and gold 2.2.0.5
>
>
> maui seems to be dead or hung up about every hour when communicating to the
> allocation manager (gold)
>
>
> maui crashed ( the daemon is not running) after joballoccharge
>
> 10/08 13:41:30 MSUDisconnect(S)
> 10/08 13:41:30 MSysEMSubmit(EM,allocation-manager,joballoccharge,842059[1])
> 10/08 13:41:30 MJobWriteStats(842059[1])
> 10/08 13:41:30 MJobToTString(842059[1],230,Buf,65536)
>
> 10/08 14:08:40 MSUDisconnect(S)
> 10/08 14:08:40 MSysEMSubmit(EM,allocation-manager,joballoccharge,842160[46])
> 10/08 14:08:40 MJobWriteStats(842160[46])
> 10/08 14:08:40 MJobToTString(842160[46],230,Buf,65536)
>
> 10/08 15:59:05 MSUDisconnect(S)
> 10/08 15:59:05 MSysEMSubmit(EM,allocation-manager,joballoccharge,841595)
> 10/08 15:59:05 MJobWriteStats(841595)
> 10/08 15:59:05 MJobToTString(841595,230,Buf,65536)
>
>
>
> hung situation:
>
> 10/08 15:26:01 MAMAllocJReserve(841268,RIndex,ErrMsg)
> 10/08 15:26:01 MS3DoCommand(allocation-manager,NULL,OBuf,ODE,SC,EMsg)
> 10/08 15:26:01 MSysEMSubmit(EM,scheduler,comcom,scheduler,allocation-manager;)
> 10/08 15:26:01 MSUConnect(S,TRUE,EMsg)
> 10/08 15:26:01 MSUSendData(S,15000000,FALSE,FALSE)
> 10/08 15:26:01 MSecGetChecksum(Buf,378,Checksum,HMAC64,CSKey)
> 10/08 15:26:01 MSUSendPacket(8,Buf,710,15000000,SC)
> 10/08 15:26:01 INFO:     packet sent (710 bytes of 710)
> 10/08 15:26:01 INFO:     command sent to server
> 10/08 15:26:01 INFO:     message sent: '<XML>'
> 10/08 15:26:01 MSURecvData(S,15000000,FALSE,SC,EMsg)
> 10/08 15:26:01 MSURecvPacket(8,BufP,1024,
>
>
>
> 10/08 14:31:01 MAMAllocJReserve(840568,RIndex,ErrMsg)
> 10/08 14:31:01 MS3DoCommand(allocation-manager,NULL,OBuf,ODE,SC,EMsg)
> 10/08 14:31:01 MSysEMSubmit(EM,scheduler,comcom,scheduler,allocation-manager;)
> 10/08 14:31:01 MSUConnect(S,TRUE,EMsg)
> 10/08 14:31:01 MSUSendData(S,15000000,FALSE,FALSE)
> 10/08 14:31:01 MSecGetChecksum(Buf,377,Checksum,HMAC64,CSKey)
> 10/08 14:31:01 MSUSendPacket(8,Buf,709,15000000,SC)
> 10/08 14:31:01 INFO:     packet sent (709 bytes of 709)
> 10/08 14:31:01 INFO:     command sent to server
> 10/08 14:31:01 INFO:     message sent: '<XML>'
> 10/08 14:31:01 MSURecvData(S,15000000,FALSE,SC,EMsg)
> 10/08 14:31:01 MSURecvPacket(8,BufP,1024,
>
>
>
> Anyone any insight and hint how to prevent the crashes?
>
> Thanks
> Eva
>
>



More information about the gold-users mailing list