[torqueusers] Enable checkpoint/restart on a per-queue basis?
tbaer at utk.edu
Mon Jun 8 10:55:20 MDT 2009
I've been experimenting with BLCR-based checkpoint/restart on a couple
different systems, and I was sort of surprised that there doesn't seem
to be any way I could find to set things up in the queue attributes so
that jobs in a particular queue are checkpointable by default. I had
expected there to be a qmgr keyword to do this, like:
set queue foo checkpoints_enabled = True
Alas, I can find nothing of the sort.
Now admittedly I *could* do this in the submit filter by prepending a
"#PBS -c enable" line to the headers of any jobs that fit certain
parameters, but it does seem a bit silly that I can set the checkpoint
directory on a per-queue basis but not whether jobs in a particular
queue default to being checkpointable.
Am I alone in this regard, or would others find this useful?
Troy Baer, HPC System Administrator
National Institute for Computational Sciences, University of Tennessee
More information about the torqueusers