[torquedev] [Bug 166] New: After upgrading to 2.5.9, MOMs keep segfaulting
bugzilla-daemon at supercluster.org
bugzilla-daemon at supercluster.org
Mon Dec 12 09:21:36 MST 2011
http://www.clusterresources.com/bugzilla/show_bug.cgi?id=166
Summary: After upgrading to 2.5.9, MOMs keep segfaulting
Product: TORQUE
Version: 2.5.x
Platform: PC
OS/Version: Linux
Status: NEW
Severity: major
Priority: P5
Component: pbs_mom
AssignedTo: knielson at adaptivecomputing.com
ReportedBy: leggett at ci.uchicago.edu
CC: torquedev at supercluster.org
Estimated Hours: 0.0
I upgraded to torque 2.5.9 from 2.5.7 last Tuesday and since then on one of my
clusters the MOMs keep segfaulting and dying. In dmesg I see something similar
to this:
pbs_mom[31409]: segfault at 0000000000000008 rip 0000003655618d6f rsp
00007fffc63f7f50 error 4
And in the mom logs I see this:
12/12/2011 09:59:13;0001; pbs_mom;Job;35935.svc.uc.futuregrid.org;task not
started, 'rm', stdio setup failed (see syslog)
12/12/2011 09:59:13;0001; pbs_mom;Svr;pbs_mom;LOG_ERROR::Bad file descriptor
(9) in tm_request, comm failed Protocol failure in commit
And in syslog I see:
Dec 12 09:59:02 c32 mpd: mpd ending mpdid=c32.uc.futuregrid.org_44987 (inside
cleanup)
Dec 12 09:59:07 c32 pbs_mom: LOG_ERROR::Connection refused (111) in open_demux,
open_demux: cannot connect to 127.0.0.1:60305
Dec 12 09:59:11 c32 last message repeated 2 times
Dec 12 09:59:13 c32 pbs_mom: LOG_ERROR::Inappropriate ioctl for device (25) in
open_demux, open_demux: connect 127.0.0.1:60305
Dec 12 09:59:13 c32 pbs_mom: LOG_ERROR::Inappropriate ioctl for device (25) in
start_process, cannot open mux stdout port
Dec 12 09:59:13 c32 pbs_mom: LOG_ERROR::Bad file descriptor (9) in tm_request,
comm failed Protocol failure in commit
Dec 12 09:59:13 c32 kernel: pbs_mom[31409]: segfault at 0000000000000008 rip
0000003655618d6f rsp 00007fffc63f7f50 error 4
--
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the torquedev
mailing list