[torqueusers] socket checkpointing
Erming.Pei at ihep.ac.cn
Sat Dec 5 02:44:48 MST 2009
I tried the BLCR based checkpointing mechanism with Torque 2.4.1.
It works well with simple jobs.
( I found that it has to modify the source code of maui if you want to let it work with maui. I don't know if there are already some good solutions for that。 )
Another thing is that it don't work with TCP checkpointing. If one need to input/output files when the job running, then checkpointing cannot work in this situation.
I just would like to know if there's a plan to implement socket-checkpointing in torque?
Nowadays, I found there already are some third-party socket-pointing implements, such as VMADump, CRAK, DCR(a Chinese implement), etc.
I think socket-checkpointing is very important to some computing centers which have resource divided but still want to utilize most of the various resources when there are idle CPUs.
Thanks if there's some comments,
More information about the torqueusers