[torqueusers] 4.1.x leftover problems

Ken Nielson knielson at adaptivecomputing.com
Thu Mar 21 10:28:21 MDT 2013


Jörg,

Is this 4.1.5.1?

Ken

On Thu, Mar 21, 2013 at 9:26 AM, Joerg Blank <j.blank at fz-juelich.de> wrote:

> Hello everyone,
>
> we are currently still experiencing 0-10 crashes per day from two causes:
>
> 1.) There is a double free in the handling of attrlists.
> 2.) It seems that sometimes information about the mywork variable in
> work_thread (u_threadpool.c) gets corrupted, which leads to a subsequent
> crash on the free call, when closing down a thread. I suspect the thread
> shutdown has to be guarded by a mutex.
>
> Regards,
> Jörg Blank
>
>
> #0  0x00007f44d3e34b23 in
>
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
> unsigned long, int) () from /usr/lib/libtcmalloc.so
> (gdb) bt
> #0  0x00007f44d3e34b23 in
>
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
> unsigned long, int) () from /usr/lib/libtcmalloc.so
> #1  0x00007f44d3e34f67 in tcmalloc::ThreadCache::Scavenge() () from
> /usr/lib/libtcmalloc.so
> #2  0x00007f44d3e41685 in tc_free () from /usr/lib/libtcmalloc.so
> #3  0x000000000046ef6c in free_attrlist (pattrlisthead=0xaa2df38) at
> attr_func.c:422
> #4  0x0000000000431542 in reply_free (prep=0x8802e88) at reply_send.c:300
> #5  0x000000000042f269 in free_br (preq=0x8802a00) at
> process_request.c:1080
> #6  0x0000000000431378 in reply_send_svr (request=0x8802a00) at
> reply_send.c:197
> #7  0x00000000004504a4 in sel_step3 (cntl=0xb1e0d80) at req_select.c:670
> #8  0x000000000044fbe7 in req_selectjobs (preq=0x8802a00) at
> req_select.c:351
> #9  0x000000000042ee51 in dispatch_request (sfds=7, request=0x8802a00)
> at process_request.c:869
> #10 0x000000000042e942 in process_request (chan=0x7a48ba0) at
> process_request.c:662
> #11 0x0000000000429f54 in process_pbs_server_port (sock=7,
> is_scheduler_port=0) at pbsd_main.c:402
> #12 0x000000000042a1b3 in start_process_pbs_server_port
> (new_sock=0x6a1fbc0) at pbsd_main.c:533
> #13 0x000000000047373e in work_thread (a=0x7fff7d872480) at
> u_threadpool.c:307
> #14 0x00007f44d29e48ca in start_thread (arg=<value optimized out>) at
> pthread_create.c:300
> #15 0x00007f44d2543b6d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> #16 0x0000000000000000 in ?? ()
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
Ken Nielson
+1 801.717.3700 office +1 801.717.3738 fax
1712 S. East Bay Blvd, Suite 300  Provo, UT  84606
www.adaptivecomputing.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130321/0afaab94/attachment.html 


More information about the torqueusers mailing list