[gold-users] maui crash after Successfully charged job

Eva Hocks hocks at sdsc.edu
Tue Oct 8 17:13:02 MDT 2013



maui 3.3.1, torque 4.2.5 and gold 2.2.0.5


maui seems to be dead or hung up about every hour when communicating to the
allocation manager (gold)


maui crashed ( the daemon is not running) after joballoccharge

10/08 13:41:30 MSUDisconnect(S)
10/08 13:41:30 MSysEMSubmit(EM,allocation-manager,joballoccharge,842059[1])
10/08 13:41:30 MJobWriteStats(842059[1])
10/08 13:41:30 MJobToTString(842059[1],230,Buf,65536)

10/08 14:08:40 MSUDisconnect(S)
10/08 14:08:40 MSysEMSubmit(EM,allocation-manager,joballoccharge,842160[46])
10/08 14:08:40 MJobWriteStats(842160[46])
10/08 14:08:40 MJobToTString(842160[46],230,Buf,65536)

10/08 15:59:05 MSUDisconnect(S)
10/08 15:59:05 MSysEMSubmit(EM,allocation-manager,joballoccharge,841595)
10/08 15:59:05 MJobWriteStats(841595)
10/08 15:59:05 MJobToTString(841595,230,Buf,65536)



hung situation:

10/08 15:26:01 MAMAllocJReserve(841268,RIndex,ErrMsg)
10/08 15:26:01 MS3DoCommand(allocation-manager,NULL,OBuf,ODE,SC,EMsg)
10/08 15:26:01 MSysEMSubmit(EM,scheduler,comcom,scheduler,allocation-manager;)
10/08 15:26:01 MSUConnect(S,TRUE,EMsg)
10/08 15:26:01 MSUSendData(S,15000000,FALSE,FALSE)
10/08 15:26:01 MSecGetChecksum(Buf,378,Checksum,HMAC64,CSKey)
10/08 15:26:01 MSUSendPacket(8,Buf,710,15000000,SC)
10/08 15:26:01 INFO:     packet sent (710 bytes of 710)
10/08 15:26:01 INFO:     command sent to server
10/08 15:26:01 INFO:     message sent: '<XML>'
10/08 15:26:01 MSURecvData(S,15000000,FALSE,SC,EMsg)
10/08 15:26:01 MSURecvPacket(8,BufP,1024,



10/08 14:31:01 MAMAllocJReserve(840568,RIndex,ErrMsg)
10/08 14:31:01 MS3DoCommand(allocation-manager,NULL,OBuf,ODE,SC,EMsg)
10/08 14:31:01 MSysEMSubmit(EM,scheduler,comcom,scheduler,allocation-manager;)
10/08 14:31:01 MSUConnect(S,TRUE,EMsg)
10/08 14:31:01 MSUSendData(S,15000000,FALSE,FALSE)
10/08 14:31:01 MSecGetChecksum(Buf,377,Checksum,HMAC64,CSKey)
10/08 14:31:01 MSUSendPacket(8,Buf,709,15000000,SC)
10/08 14:31:01 INFO:     packet sent (709 bytes of 709)
10/08 14:31:01 INFO:     command sent to server
10/08 14:31:01 INFO:     message sent: '<XML>'
10/08 14:31:01 MSURecvData(S,15000000,FALSE,SC,EMsg)
10/08 14:31:01 MSURecvPacket(8,BufP,1024,



Anyone any insight and hint how to prevent the crashes?

Thanks
Eva



More information about the gold-users mailing list