[Mauiusers] Maui dies

Gerson Galang gerson.sapac at gawab.com
Thu Sep 2 21:04:47 MDT 2004


Stewart might see a different error but this is what I've been getting 
after following the procedure that Dave suggested.

[root at dev root]# gdb /usr/local/maui/sbin/maui
GNU gdb Red Hat Linux (5.3post-0.20021129.18rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...
(gdb) run
Starting program: /usr/local/maui/sbin/maui

Program received signal SIGSEGV, Segmentation fault.
0x42074d5e in malloc_consolidate () from /lib/tls/libc.so.6
(gdb) where
#0  0x42074d5e in malloc_consolidate () from /lib/tls/libc.so.6
#1  0x420743c9 in _int_malloc () from /lib/tls/libc.so.6
#2  0x4207378d in malloc () from /lib/tls/libc.so.6
#3  0x0812522f in PBSD_rdrpy (c=1) at ../Libifl/PBSD_rdrpy.c:118
#4  0x081266be in PBSD_status_get (c=1) at ../Libifl/PBSD_status.c:131
#5  0x0812669a in PBSD_status (c=1, function=20, id=0x8160148 "", 
attrib=0x0, extend=0x0)
     at ../Libifl/PBSD_status.c:117
#6  0x08124908 in pbs_statque (c=1, id=0x0, attrib=0x0, extend=0x0)
     at ../Libifl/pbsD_statque.c:97
#7  0x0810a1ed in MPBSLoadQueueInfo (R=0x8dd00a0, SpecN=0x9909a98, 
LoadFull=1 '\001',
     SC=0x0) at MPBSI.c:1242
#8  0x0810dcc4 in MPBSNodeUpdate (N=0x9909a98, PNode=0x9944430, 
NState=mnsIdle,
     R=0x8dd00a0) at MPBSI.c:3184
#9  0x08109f80 in MPBSClusterQuery (R=0x8dd00a0, RCount=0xbffedfbc, 
SC=0x0) at MPBSI.c:1111
#10 0x080c20a8 in __MUTFunc (V=0xbffedf10) at MUtil.c:5348
#11 0x080c2048 in MUThread (F=0x8109cf4 <MPBSClusterQuery>, TimeOut=9, 
RC=0xbffedfc0,
     ACount=3, Lock=0x0) at MUtil.c:5321
#12 0x08100798 in MRMClusterQuery (RCount=0xbffee404, SC=0x0) at MRM.c:403
#13 0x08100335 in MRMGetInfo () at MRM.c:262
#14 0x0808483c in MSchedProcessJobs (OldDay=0xbfffe480 "Fri", 
GlobalSQ=0xbfffa480,
     GlobalHQ=0xbfff6480) at MSched.c:6689
#15 0x0804a471 in main (ArgC=1, ArgV=0xbfffe514) at Server.c:166
#16 0x42015704 in __libc_start_main () from /lib/tls/libc.so.6
(gdb)

This only happens whenever I've got standing reservations setup on my 
maui.cfg file. The only reason why I have SR is to get automatic 
preemption to work.

Thanks.

jacksond at supercluster.org wrote:
> Stewart,
> 
>   If you start up Maui under gdb, when it crashes, the gdb 'where' 
> subcommand should indicate the cause.  Send us this output and we will 
> get it fixed immediately.
> 
> Thanks,
> Dave
> 
> ps.  export MAUIDEBUG=yes before starting maui to allow gdb to follow 
> the process.
> 
> On Thu, 2 Sep 2004 Stewart.Samuels at aventis.com wrote:
> 
>>
>> >Hi,
>>
>> >I've been testing maui-3.2.6 patch7 thoroughly by sending lots of jobs
>> >on the server (torque) just to see how maui will schedule the jobs I
>> >have in the queue. One thing that I've noticed is that after
>> >running/suspending/resuming the jobs I have in the queue, the maui
>> >scheduler just suddenly dies. i know that the scheduler died because my
>> >jobs which are supposed to run and resume just sits there in the queue
>> >doing nothing. Doing a ps -aux also won't show the maui process that
>> >I've started. Is this a known issue of maui3.2.6p7?
>>
>> >Regards,
>> >Gerson
>>
>>
>> Hello Gerson,
>>
>> I am performing the same process as you and am seeing the same 
>> result.  I am running
>> torque-1.0.1p6 and maui-3.2.6p6.  It is not clear when maui dies and 
>> what triggers it.  But a
>> work around is to install a cron script to check whether or not maui 
>> exists in the process
>> list and if not, start it again.  Has anyone else seen this problem?  
>> Apparently, it persists
>> through at least a couple of patch levels.
>>
>>
>>                Stewart Samuels
>>                Technical Advisor
>>               Global Unix Engineering Services
>>            1041 Route 202-206
>>               Bridgewater, NJ  08807
>>
>>               (908) 231-4762
>>               Stewart.Samuels at Aventis.com
>>
>>
>>
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://supercluster.org/mailman/listinfo/mauiusers


More information about the mauiusers mailing list