[torqueusers] checkpointing and shared filesystem

Alexander Oltu Alexander.Oltu at uni.no
Tue Feb 16 01:26:27 MST 2010


On Mon, 15 Feb 2010 17:15:26 +0000
Anna Jonna Armannsdottir wrote:

> On Mon, 2010-02-15 at 15:48 +0100, Alexander Oltu wrote: 
> > Hello all,
> > 
> > We have setup where all pbs_mom's of all nodes have checkpoint
> > directories on shared FS in the same folder. I wonder if torque can
> > be configured to avoid scp coping from exec node to pbs_server
> > during qhold and back from pbs_server to exec host when qrls. But
> > just reuse same checkpoint file which is already on shared
> > filesystem? Does such option already exist in torque? 
> > 
> > Thanks,
> > Alex.
> 
> Thanks for bringing this subject up. I have also been looking for
> this solution. Now how do the users use this feature? 
> Do they have to set some parameters into their submit script or on 
> the command line to qsub, or is it completely automatic? 
> 

Same as without shared fs. When submitting job they will have to
specify -c enabled, or in submit script:
#PBS -c enabled

When you finished setup checkpointing please have a look at this
acceptance test:
http://www.clusterresources.com/products/torque/docs/blcr/acceptance-2.4.shtml

Regards,
Alex.


More information about the torqueusers mailing list