[torqueusers] torque+blcr+openmpi

Marvin Novaglobal marvin.novaglobal at gmail.com
Sun Aug 22 20:19:58 MDT 2010


I second this feature


Regards,
Marvin


On Fri, Jun 25, 2010 at 10:58 PM, Danny Sternkopf
<dsternkopf at hpce.nec.com>wrote:

> Hi,
>
> any news about this? I have the following setup:
> o torque 2.4.8
> o openmpi 1.4.2
> o blcr 0.8.2
>
> The checkpoint/restart scripts from Torque's contrib/blcr work for
> single node application without MPI. I created new scripts for OpenMPI
> applications. The checkpoint works, but the release does not. The issue
> might be that ompi-checkpoint writes a directory including checkpoint
> files for each process plus metadata and Torque expects one single
> checkpoint file. Any experiences?
>
> Btw another issue is that the checkpoint/restart scripts run as root.
> ompi-checkpoint doesn't allow that root can checkpoint user jobs. So you
> have to run the ompi-checkpoint as user. The restart script of course
> needs this as well to restart process under the corresponding user id.
>
> Furthermore any comments to handle MPI and single process applications
> with same checkpoint/restart scripts?
>
> Regards,
>
> Danny
> On 3/13/2010 8:39 AM, Chris Samuel wrote:
> > On Tue, 23 Feb 2010 09:15:27 pm Anton Starikov wrote:
> >
> >> Can anyone provide example of checkpoint script for torque which deals
> with
> >> open-mpi checkpointing?
> >
> > I too would be very interested in this - I am pondering trying BLCR on
> our new
> > clusters at VLSCI..
> >
> >
> >
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100823/342140b7/attachment.html 


More information about the torqueusers mailing list