[torqueusers] Torque 2.5.9 MOMs keep segfaulting
Ti Leggett
leggett at mcs.anl.gov
Wed Jan 11 09:05:17 MST 2012
I finally got around to doing this, but I don't see a core file in /var/spool/torque or in /usr/sbin. Where would the core get dumped?
On Dec 20, 2011, at 3:03 PM, Ken Nielson wrote:
> ----- Original Message -----
>> From: "Troy Baer" <tbaer at utk.edu>
>> To: "Torque Users Mailing List" <torqueusers at supercluster.org>
>> Sent: Tuesday, December 20, 2011 8:59:56 AM
>> Subject: Re: [torqueusers] Torque 2.5.9 MOMs keep segfaulting
>>
>> On Thu, 2011-12-08 at 10:36 -0600, Ti Leggett wrote:
>>> I just upgraded from 2.5.7 to 2.5.9 on Tuesday and since then, MOMs
>>> keep randomly segfaulting and dying. I see this in the MOM log
>>> right before dying:
>>>
>>> 12/08/2011 10:09:14;0001; pbs_mom;Svr;pbs_mom;LOG_ERROR::Bad file
>>> descriptor (9) in tm_request, comm failed Protocol failure in
>>> commit
>>>
>>>
>>> And something similar to this in dmesg:
>>>
>>> pbs_mom[22354]: segfault at 0000000000000008 rip 00002b585249ed6f
>>> rsp 00007fff19e96df0 error 4
>>
>> We've also seen this on one of our systems and had to fall back to
>> 2.5.8
>> on it.
>>
>> --Troy
>> --
>> Troy Baer, HPC System Administrator
>> National Institute for Computational Sciences, University of
>> Tennessee
>> http://www.nics.tennessee.edu/
>> Phone: 865-241-4233
>
> Could someone configure TORQUE using --with-debug and then send a stack trace of the crash?
>
> Ken
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: Message signed with OpenPGP using GPGMail
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20120111/e0264439/attachment.bin
More information about the torqueusers
mailing list