[torquedev] pbm_mom segfault in TMomCheckJobChild

Chris Samuel csamuel at vpac.org
Sun Dec 21 22:20:29 MST 2008


----- "Glen Beane" <glen.beane at gmail.com> wrote:

> Given what I describe above, I do not think this patch should be
> included at this time.  There must be something else going on here.
> Does anyone else have any thoughts?

We're not seeing these sorts of crashes here, but we
do still see our oddity (which no one else has mentioned)
of occasionally pbs_mom logging the fact that it can't
open the spool files for a job on the local disk.

For example:

pbs_mom: No such file or directory (2) in open_std_file, cannot open/create stdout/stderr file '/usr/spool/PBS/spool/674029.tango-m.vpac.org.OU' (mode: 2001, keeping: FALSE)
pbs_mom: No such file or directory (2) in open_std_file, cannot open/create stdout/stderr file '/usr/spool/PBS/spool/674029.tango-m.vpac.org.ER' (mode: 2001, keeping: FALSE)

Which is patently wrong as /usr/spool/PBS is always there.
That job output then just disappears.

Asides from that (which has been there since 2.3 started)
2.3.5 seems really very stable for us.

> I think Garrick and I are on the same page here,  can
> anyone show me where I'm wrong with the code quoted above?

Nope, seems to make logical sense to me, it's only called
when it's Idle and the close only happens if it's not. :-(

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


More information about the torquedev mailing list