[torquedev] pbs_mom crashing with segfault after pbs_server restart
Ling C. Ho
ling at fnal.gov
Wed Mar 25 13:03:53 MDT 2009
Hello,
We noticed that sometimes when pbs_server restarts using SIGTERM, our pbs_mom processes (on worker
nodes) died with segfault (shows up in /var/log/messages). This usually happens right after
something like this:
03/24/2009 21:26:29;0008; pbs_mom;Job;do_rpp;got an inter-server request
03/24/2009 21:26:29;0001; pbs_mom;Job;is_request;stream 0 version 1
I traced it down to the mom_server_find_by_ip function. At the line
addr = rpp_getaddr(pms->SStream)
If rpp_getaddr returns a NULL, segfault happens.
By doing this, I don't get the segfault even after a few hundreds restart of the pbs_server.
if ((addr = rpp_getaddr(pms->SStream)) == NULL) {
return (NULL);
}
Does this looks like a potential problem?
Thanks,
...
ling
More information about the torquedev
mailing list