[Mauiusers] maui crashes with segv (version 3.2.6.p16)

Bas van der Vlies basv at sara.nl
Wed Nov 1 15:26:09 MST 2006


I think i found the bug when we have more then 20,000 jobs maui  
crashed with a segv, because MJob was not circular anymore the last  
element did not point to the first entry in the list.

We got a lot of these errors in maui.log:
11/01 21:12:13 ERROR:  job hash table is FULL.  cannot add MJob 
[21208] '43806'
11/01 21:12:13 ERROR:    job buffer is full  (ignoring job  
'43806.testm.irc.sara.nl')
11/01 21:12:13 ERROR:  job hash table is FULL.  cannot add MJob 
[21218] '43817'
11/01 21:12:13 ERROR:    job buffer is full  (ignoring job  
'43817.testm.irc.sara.nl')
11/01 21:12:13 ERROR:  job hash table is FULL.  cannot add MJob 
[21228] '43828'
11/01 21:12:13 ERROR:    job buffer is full  (ignoring job  
'43828.testm.irc.sara.nl')
11/01 21:12:14 ERROR:  job hash table is FULL.  cannot add MJob 
[21238] '43839'
11/01 21:12:14 ERROR:    job buffer is full  (ignoring job  
'43839.testm.irc.sara.nl')
11/01 21:12:15 ERROR:  job hash table is FULL.  cannot add MJob 
[21304] '43906'
11/01 21:12:15 ERROR:    job buffer is full  (ignoring job  
'43906.testm.irc.sara.nl')
11/01 21:12:15 ERROR:  job hash table is FULL.  cannot add MJob 
[21314] '43917'
11/01 21:12:15 ERROR:    job buffer is full  (ignoring job  
'43917.testm.irc.sara.nl')
11/01 21:12:16 ERROR:  job hash table is FULL.  cannot add MJob 
[21324] '43928'
11/01 21:12:16 ERROR:    job buffer is full  (ignoring job  
'43928.testm.irc.sara.nl')
11/01 21:12:16 ERROR:  job hash table is FULL.  cannot add MJob 
[21334] '43939'
11/01 21:12:16 ERROR:    job buffer is full  (ignoring job  
'43939.testm.irc.sara.nl')
11/01 21:12:19 ERROR:  job hash table is FULL.  cannot add MJob 
[21464] '44070'
11/01 21:12:19 ERROR:    job buffer is full  (ignoring job  
'44070.testm.irc.sara.nl')
11/01 21:12:19 ERROR:  job hash table is FULL.  cannot add MJob 
[21474] '44081'

IIt goes wrong in MJob.c. It computes a hashkey for the job and this  
hashkey is used as starting value for loop,   the end
value of the loop is MMAX_JOB + MAX_MHBUF. When all entires in this  
range are occupied it will display the error above.
The patch that i have written will slowdown the server, because it  
will search from 0 till the end for free slots.

I am trying to optimize it.  It will now run jobs and not segv at the  
start


   /* HvB bas
   DBG(1,fSTRUCT) DPrint("ERROR:  MSched.M[mxoJob] = %d\n", MSched.M 
[mxoJob]);
   for (index = hashkey;index < MSched.M[mxoJob] + MAX_MHBUF;index++)
   */
   for (index = 0;index < MSched.M[mxoJob] + MAX_MHBUF;index++)


I have written something that first tries the computed haskey and  
then the brute force method. I am now testing the patch.

Regards

--
Bas van der Vlies
basv at sara.nl





More information about the mauiusers mailing list