[Mauiusers] slurm 1.3.8 + maui-3.2.6p21-snap.1243977349 = "checksum does not match"

А gip_gop at mail.ru
Fri Jun 26 06:53:39 MDT 2009


 Hello,

machine is:
# uname -a
Linux n00 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux

running slurm 1.3.8

I've installed maui (maui-3.2.6p21-snap.1243977349.tar.gz , Eygene Ryabinkin's correction is included, i checked the sources)

Configured:

./configure --prefix=/opt/maui --mandir=/usr/share/man --with-spooldir=/opt/maui --with-machine=n00 --with-key=78 --with-wiki
Created file /etc/wiki.conf with line:

AuthKey=78
Included in slurm.conf :

SchedulerType=sched/wikiSchedulerPort=7321
But finally got the "ALERT: checksum does not match ".

In logs below i noticed a couple of facts:
1) The number of bytes varies in reading and MSecGetChecksum function (for some packets) :
06/26 13:34:47 INFO:     3704 of 3704 bytes read from sd 706/26 13:34:47 MSecGetChecksum(Buf,3632,Checksum,DES,CSKey)

2)The string seems to be untimely broken (may be just not to litter the log):
06/26 13:34:47 ALERT:    checksum does not match (e3743199c5566b9a:9ab1d151dd49049c)  request 'TS=1246008887 AUTH=slurm DT=SC=0 ARG=17#191814:STATE=Running;TASKLIST=:n01;UPDATETIME=1246007985;WCLIMIT=31536000;TASKS='
 

Precise maui log includes lines:

06/26 13:34:47 ServerProcessRequests()06/26 13:34:47 MLogRoll(NULL,0,1)06/26 13:34:47 INFO:     not rolling logs (441447 < 10000000)06/26 13:34:47 MResAdjust(NULL,0,0)06/26 13:34:47 MJobSetAttr(,PAL,Value,1,2)06/26 13:34:47 INFO:     job flags for job : 0, req napolicy=SHARED06/26 13:34:47 MJobSetAttr(,GAttr,Value,1,5)06/26 13:34:47 MStatInitializeActiveSysUsage()06/26 13:34:47 MStatClearUsage([NONE],Active)06/26 13:34:47 ServerUpdate()06/26 13:34:47 MSysUpdateTime()06/26 13:34:47 INFO:     starting iteration 6006/26 13:34:47 MSchedProcessJobs()06/26 13:34:47 MRMGetInfo()06/26 13:34:47 MClusterClearUsage()06/26 13:34:47 MRMClusterQuery()06/26 13:34:47 MWikiClusterLoadInfo(n00,RCount,EMsg,SC)06/26 13:34:47 MWikiDoCommand(n00,7321,9000000,CHECKSUM,CMD=GETNODES ARG=0:ALL,Data,DataSize,SC)06/26 13:34:47 MSUConnect(S,FALSE,EMsg)06/26 13:34:47 INFO:     trying to connect to 10.1.0.1 (Port: 7321)06/26 13:34:47 INFO:     non-blocking mode established06/26 13:34:47 MSUSelectWrite(7,90
 00000)06/26 13:34:47 INFO:     successful connect to TCP server (sd: 7)06/26 13:34:47 MSUSendData(S,9000000,TRUE,FALSE)06/26 13:34:47 MSecGetChecksum2(Buf1,27,Buf2,22,Checksum,DES,CSKey)06/26 13:34:47 INFO:     header created '00000069CK=2c5f6971a5844eef TS=1246008887 AUTH=root DT='06/26 13:34:47 INFO:     sending short packet '00000069CK=2c5f6971a5844eef TS=1246008887 AUTH=root DT=CMD=GETNODES ARG=0:ALL'06/26 13:34:47 MSUSendPacket(7,Buf,78,9000000,SC)06/26 13:34:47 MSUSelectWrite(7,9000000)06/26 13:34:47 INFO:     packet sent (78 bytes of 78)06/26 13:34:47 INFO:     command sent to server06/26 13:34:47 INFO:     message sent: 'CMD=GETNODES ARG=0:ALL'06/26 13:34:47 MSURecvData(S,9000000,TRUE,SC,EMsg)06/26 13:34:47 MSURecvPacket(7,BufP,9,NULL,9000000,SC)06/26 13:34:47 MSUSelectRead(7,9000000)06/26 13:34:47 INFO:     9 of 9 bytes read from sd 706/26 13:34:47 MSURecvPacket(7,BufP,4435,NULL,9000000,SC)06/26 13:34:47 MSUSelectRead(7,9000000)06/26 13:34:47 INFO:     4435 of 4435 
 bytes read from sd 706/26 13:34:47 MSecGetChecksum(Buf,4363,Checksum,DES,CSKey)06/26 13:34:47 ALERT:    checksum does not match (351c7a893a2e1699:b4584308b241ec39)  request 'TS=1246008887 AUTH=slurm DT=SC=0 ARG=64#n01:STATE=Running;ARCH=x86_64;OS=Linux;CMEMORY=10240;CDISK=0;CPROC=8;#n02:STATE='06/26 13:34:47 ERROR:    cannot receive data from server n00:732106/26 13:34:47 MSUDisconnect(S)06/26 13:34:47 ALERT:    cannot get node list from WIKI RM06/26 13:34:47 ALERT:    cannot load cluster resources on RM (RM 'n00' failed in function 'clusterquery')06/26 13:34:47 WARNING:  no resources detected06/26 13:34:47 MRMWorkloadQuery()06/26 13:34:47 MWikiWorkloadQuery(n00,JCount,SC)06/26 13:34:47 MWikiDoCommand(n00,7321,9000000,CHECKSUM,CMD=GETJOBS ARG=0:ALL,Data,DataSize,SC)06/26 13:34:47 MSUConnect(S,FALSE,EMsg)06/26 13:34:47 INFO:     trying to connect to 10.1.0.1 (Port: 7321)06/26 13:34:47 INFO:     non-blocking mode established06/26 13:34:47 MSUSelectWrite(7,9000000)06/26 13:34:4
 7 INFO:     successful connect to TCP server (sd: 7)06/26 13:34:47 MSUSendData(S,9000000,TRUE,FALSE)06/26 13:34:47 MSecGetChecksum2(Buf1,27,Buf2,21,Checksum,DES,CSKey)06/26 13:34:47 INFO:     header created '00000068CK=4e880ad31a667b74 TS=1246008887 AUTH=root DT='06/26 13:34:47 INFO:     sending short packet '00000068CK=4e880ad31a667b74 TS=1246008887 AUTH=root DT=CMD=GETJOBS ARG=0:ALL'06/26 13:34:47 MSUSendPacket(7,Buf,77,9000000,SC)06/26 13:34:47 MSUSelectWrite(7,9000000)06/26 13:34:47 INFO:     packet sent (77 bytes of 77)06/26 13:34:47 INFO:     command sent to server06/26 13:34:47 INFO:     message sent: 'CMD=GETJOBS ARG=0:ALL'06/26 13:34:47 MSURecvData(S,9000000,TRUE,SC,EMsg)06/26 13:34:47 MSURecvPacket(7,BufP,9,NULL,9000000,SC)06/26 13:34:47 MSUSelectRead(7,9000000)06/26 13:34:47 INFO:     3704 of 3704 bytes read from sd 706/26 13:34:47 MSecGetChecksum(Buf,3632,Checksum,DES,CSKey)06/26 13:34:47 ALERT:    checksum does not match (e3743199c5566b9a:9ab1d151dd49049c)  requ
 est 'TS=1246008887 AUTH=slurm DT=SC=0 ARG=17#191814:STATE=Running;TASKLIST=:n01;UPDATETIME=1246007985;WCLIMIT=31536000;TASKS='06/26 13:34:47 ERROR:    cannot receive data from server n00:732106/26 13:34:47 MSUDisconnect(S)06/26 13:34:47 ALERT:    cannot get job list from WIKI RM06/26 13:34:47 ALERT:    cannot load cluster workload on RM (RM 'n00' failed in function 'workloadquery')06/26 13:34:47 WARNING:  no workload detected
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20090626/ed48f91a/attachment.html 


More information about the mauiusers mailing list