[torqueusers] Preventing jobs to be re-runed when
Troy Baer
troy at osc.edu
Mon Mar 13 13:24:21 MST 2006
On Mon, 2006-03-13 at 14:16 -0600, David McGiven wrote:
> I was running a job in one of my cluster nodes. Due to an electrical
> problem the node was suddenly and unexpectedly rebooted.
>
> While it was rebooting, the job was marked with an "E" when issuing qstat
> command. One minute after or so, when the node came back to normal
> operation, the job was "R" again. The system had automatically started the
> job again.
>
> How can I prevent this from happening?
>
> It's very dangerous because not all the jobs are meant to be resumed
> "automatically" and they might overwritte the already processed data.
In TORQUE and other PBS variants, jobs default to being rerunnable.
Jobs that are not rerunnable need to declare themselves as such, using
the -r flag to qsub:
#PBS -r n
See the qsub man page for more information.
--Troy
--
Troy Baer troy at osc.edu
Science & Technology Support http://www.osc.edu/hpc/
Ohio Supercomputer Center 614-292-9701
More information about the torqueusers
mailing list