[torqueusers] Issue copying OU file
Kit Menlove
kit at byu.net
Tue Jan 29 11:07:58 MST 2013
Hi all,
I'm using a cluster that uses Torque as the batch system. About half of the
time, checkpointing with DMTCP fails while copying the temporary output
buffer/file:
cp -f /var/spool/torque/spool/jobid.myserver.OU
/checkpoint_dir/ckpt_myprog_52b886013bb1c112-27763-51060104_files/jobid.myse
rver.OU_99001
I'm using dmtcp_checkpoint (v1.2.6) with the --checkpoint-open-files option.
All I know is that the copy command fails, not why (though I know the
destination directory exists and it does work about half the time). Can
anyone explain why the OU file might not exist at the time of checkpointing,
or what else might be the cause of the failure?
Thanks,
Kit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130129/48b87fc2/attachment-0001.html
More information about the torqueusers
mailing list