[torquedev] [Bug 96] handle failed stagein jobs properly
bugzilla-daemon at supercluster.org
bugzilla-daemon at supercluster.org
Fri Nov 5 08:25:56 MDT 2010
http://www.clusterresources.com/bugzilla/show_bug.cgi?id=96
Simon Toth <SimonT at mail.muni.cz> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |SimonT at mail.muni.cz
--- Comment #1 from Simon Toth <SimonT at mail.muni.cz> 2010-11-05 08:25:56 MDT ---
This is the code handling this state. From looking at it, it seems pretty
reasonable. The job is held and owner is mailed so he will either delete, or
unhold the job (although I don't know how that can be done). Plus jobs seem to
be held in this state for only 1800 seconds (30 minutes).
if (code != 0)
{
/* stage in failed - hold job */
free_nodes(pjob);
pwait = &pjob->ji_wattr[(int)JOB_ATR_exectime];
if ((pwait->at_flags & ATR_VFLAG_SET) == 0)
{
pwait->at_val.at_long = time_now + PBS_STAGEFAIL_WAIT;
pwait->at_flags |= ATR_VFLAG_SET;
job_set_wait(pwait, pjob, 0);
}
svr_setjobstate(pjob, JOB_STATE_WAITING, JOB_SUBSTATE_STAGEFAIL);
if (preq->rq_reply.brp_choice == BATCH_REPLY_CHOICE_Text)
{
/* set job comment */
/* NYI */
svr_mailowner(
pjob,
MAIL_STAGEIN,
MAIL_FORCE,
preq->rq_reply.brp_un.brp_txt.brp_str);
}
}
else
--
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the torquedev
mailing list