[torqueusers] Enable checkpoint/restart on a per-queue basis?

Troy Baer tbaer at utk.edu
Mon Jun 8 10:55:20 MDT 2009

Hello all,

I've been experimenting with BLCR-based checkpoint/restart on a couple
different systems, and I was sort of surprised that there doesn't seem
to be any way I could find to set things up in the queue attributes so
that jobs in a particular queue are checkpointable by default.  I had
expected there to be a qmgr keyword to do this, like:

set queue foo checkpoints_enabled = True

Alas, I can find nothing of the sort.

Now admittedly I *could* do this in the submit filter by prepending a
"#PBS -c enable" line to the headers of any jobs that fit certain
parameters, but it does seem a bit silly that I can set the checkpoint
directory on a per-queue basis but not whether jobs in a particular
queue default to being checkpointable.

Am I alone in this regard, or would others find this useful?

Troy Baer, HPC System Administrator
National Institute for Computational Sciences, University of Tennessee
Phone:  865-241-4233

More information about the torqueusers mailing list