[torqueusers] Issue copying OU file
kit at byu.net
Tue Jan 29 11:07:58 MST 2013
I'm using a cluster that uses Torque as the batch system. About half of the
time, checkpointing with DMTCP fails while copying the temporary output
cp -f /var/spool/torque/spool/jobid.myserver.OU
I'm using dmtcp_checkpoint (v1.2.6) with the --checkpoint-open-files option.
All I know is that the copy command fails, not why (though I know the
destination directory exists and it does work about half the time). Can
anyone explain why the OU file might not exist at the time of checkpointing,
or what else might be the cause of the failure?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers