[torqueusers] Can i control if the jobs dies or not??
Leandro
leotavaneiro at gmail.com
Thu Aug 11 05:22:34 MDT 2005
Thank you for the information. I will test it and any news i will reply to
you.
This patch is for the latest snapshot?
Regards,
--
Leandro Tavares Carneiro
Analista de Suporte Linux/Unix
2005/8/10, Garrick Staples <garrick at usc.edu>:
>
> On Wed, Aug 10, 2005 at 06:13:32PM -0700, Garrick Staples alleged:
> > On Wed, Aug 10, 2005 at 08:23:24AM -0300, Leandro alleged:
> > > behavior of PBS/Torque is kill the job when a node dies. Can i change
> this
> > > behavior? If there's no way to do tha with some kind of configuration,
> can
> > > someone point me in the code where i can work on this?
> >
> > At this point in time, the MOM on the execution node (MS) will always
> kill the
> > job if a sister MOM isn't replying.
> >
> > MS sends IM_POLL_JOB messages to sisters. When a sister isn't replying,
> MS
> > closes the connection with mom_comm.c:im_eof() which calls
> > mom_comm.c:node_bailout(). With outstanding IM_POLL_JOB messages,
> > node_bailout() sets "pjob->ji_nodekill = np->hn_node;" and
> > mom_main.c:job_over_limit() kills the job if "pjob->ji_nodekill !=
> > TM_ERROR_NODE".
>
> I haven't tried this yet, but this should do the trick:
>
> --- src/resmom/mom_comm.c_orig 2005-07-26 23:24:55.000000000 -0700
> +++ src/resmom/mom_comm.c 2005-08-10 19:25:45.000000000 -0700
> @@ -1101,8 +1101,6 @@ void node_bailout(
>
> log_err(-1,id,log_buffer);
>
> - pjob->ji_nodekill = np->hn_node;
> -
> break;
>
> case IM_GET_TID:
>
>
> --
> Garrick Staples, Linux/HPCC Administrator
> University of Southern California
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20050811/bf277dea/attachment.html
More information about the torqueusers
mailing list