marvin.novaglobal at gmail.com
Sun Aug 22 20:19:58 MDT 2010
I second this feature
On Fri, Jun 25, 2010 at 10:58 PM, Danny Sternkopf
<dsternkopf at hpce.nec.com>wrote:
> any news about this? I have the following setup:
> o torque 2.4.8
> o openmpi 1.4.2
> o blcr 0.8.2
> The checkpoint/restart scripts from Torque's contrib/blcr work for
> single node application without MPI. I created new scripts for OpenMPI
> applications. The checkpoint works, but the release does not. The issue
> might be that ompi-checkpoint writes a directory including checkpoint
> files for each process plus metadata and Torque expects one single
> checkpoint file. Any experiences?
> Btw another issue is that the checkpoint/restart scripts run as root.
> ompi-checkpoint doesn't allow that root can checkpoint user jobs. So you
> have to run the ompi-checkpoint as user. The restart script of course
> needs this as well to restart process under the corresponding user id.
> Furthermore any comments to handle MPI and single process applications
> with same checkpoint/restart scripts?
> On 3/13/2010 8:39 AM, Chris Samuel wrote:
> > On Tue, 23 Feb 2010 09:15:27 pm Anton Starikov wrote:
> >> Can anyone provide example of checkpoint script for torque which deals
> >> open-mpi checkpointing?
> > I too would be very interested in this - I am pondering trying BLCR on
> our new
> > clusters at VLSCI..
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> torqueusers mailing list
> torqueusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers