[torqueusers] Question about checkpoint for MPI

Christopher Samuel samuel at unimelb.edu.au
Wed Dec 5 19:37:02 MST 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/12/12 08:31, Andrus, Brian Contractor wrote:

> Well, That is sad news.

Indeed.

> What are the options out there for checkpoint/restart of a job
> then?

It's worth noting that the kernel community is following a completely
different checkpoint/restart path, that of the OpenVZ developers
"heckpoint/restore in user space" project (CRIU).

You can read more about it here:

 https://lwn.net/Articles/525675/

The CRIU website is here:

 http://criu.org/

It will also be up for discussion at LCA2013 in Canberra this year
(though I won't be there).

I'd suggest it's worth bringing up on the openmpi-devel list, I must
just do that now.

cheers,
Chris
- -- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlDABM4ACgkQO2KABBYQAh8QNQCggjPN3aItrtAgukZ2OJE4bSHT
GjMAoIdB8EuOhzAhGMlVk3a4rFesONHO
=o5/N
-----END PGP SIGNATURE-----


More information about the torqueusers mailing list