[torqueusers] Re: Jobs hanging in R state with torque 2.3.0 (workaround)

Ari Pollak aripollak at gmail.com
Wed Apr 16 14:39:27 MDT 2008


As a workaround, I've changed line 547 in src/resmom/catch_child.c to this:

    if (pjob->ji_qs.ji_substate != JOB_SUBSTATE_EXITING &&
            pjob->ji_qs.ji_substate != JOB_SUBSTATE_OBIT)

So it will try sending the obit again, even if it thinks it's already
being sent. This seems to eliminate the problem for me, and I'm not
seeing any ill effects. I also found a comment in post_epilogue() that
would indicate a proper retry is supposed to happen but was never
implemented.


More information about the torqueusers mailing list