[torqueusers] How to prevent torque from restarting jobs?

Garrick Staples garrick at clusterresources.com
Thu Jun 29 13:55:17 MDT 2006


On Thu, Jun 29, 2006 at 09:46:44PM +0200, Seb Seb alleged:
> Thank you Garrick. However, I was wondering if it was possible to always force this, even when the qsub's -r n argument is not specified.

It is currently not configurable.  You could change the default in
src/cmds/qsub.c, in set_opt_defaults(), change:
  set_attr(&attrib,ATTR_r,"TRUE");
to
  set_attr(&attrib,ATTR_r,"FALSE");

    
>   From what I can read on PBS manual, if pbs_mom is run with the -r option, MOM will kill any processes belonging to jobs, mark the jobs as terminated, and notify the batch server which owns the job. However, in the Torque manual, "pbs_mom -r" would just perform a level 1 job recovery on restart. What does a "level 1 job recovery" means?

Don't confuse qsub's -r with pbs_mom's -r, they are unrelated.

I'm not sure about the term "level 1 job recovery", but the pbs_mom
manpage has the correct description.  'pbs_mom -r' will kill processes
and inform the server the job has exited.  And no, I don't know why this
is useful :)



> Garrick Staples <garrick at clusterresources.com> a ?crit :
>   On Thu, Jun 29, 2006 at 08:55:30PM +0200, Seb Seb alleged:
> > Hi,
> > 
> > Is there a way to prevent torque from automatically restarting jobs after a computer crash?
> 
> qsub's -r argument, from the manpage:
> 
> -r y|n Declares whether the job is rerunable. See the qrerun command.
> The option argument is a single character, either y or n.
> 
> If the argument is "y", the job is rerunable. If the argument
> is "n", the job is not rerunable. The default value is 'y',
> rerunable.



More information about the torqueusers mailing list