[torqueusers] Re: Jobs hanging in R state with torque 2.3.0
(workaround)
Ari Pollak
aripollak at gmail.com
Wed Apr 16 14:39:27 MDT 2008
As a workaround, I've changed line 547 in src/resmom/catch_child.c to this:
if (pjob->ji_qs.ji_substate != JOB_SUBSTATE_EXITING &&
pjob->ji_qs.ji_substate != JOB_SUBSTATE_OBIT)
So it will try sending the obit again, even if it thinks it's already
being sent. This seems to eliminate the problem for me, and I'm not
seeing any ill effects. I also found a comment in post_epilogue() that
would indicate a proper retry is supposed to happen but was never
implemented.
More information about the torqueusers
mailing list