[torqueusers] checkpointing and shared filesystem
Alexander Oltu
Alexander.Oltu at uni.no
Tue Feb 16 01:26:27 MST 2010
On Mon, 15 Feb 2010 17:15:26 +0000
Anna Jonna Armannsdottir wrote:
> On Mon, 2010-02-15 at 15:48 +0100, Alexander Oltu wrote:
> > Hello all,
> >
> > We have setup where all pbs_mom's of all nodes have checkpoint
> > directories on shared FS in the same folder. I wonder if torque can
> > be configured to avoid scp coping from exec node to pbs_server
> > during qhold and back from pbs_server to exec host when qrls. But
> > just reuse same checkpoint file which is already on shared
> > filesystem? Does such option already exist in torque?
> >
> > Thanks,
> > Alex.
>
> Thanks for bringing this subject up. I have also been looking for
> this solution. Now how do the users use this feature?
> Do they have to set some parameters into their submit script or on
> the command line to qsub, or is it completely automatic?
>
Same as without shared fs. When submitting job they will have to
specify -c enabled, or in submit script:
#PBS -c enabled
When you finished setup checkpointing please have a look at this
acceptance test:
http://www.clusterresources.com/products/torque/docs/blcr/acceptance-2.4.shtml
Regards,
Alex.
More information about the torqueusers
mailing list