[torquedev] about Bug 68 - Releasing of multi-process checkpointed job fails

Chu Qiu chqiu at cuc.edu.cn
Wed Oct 20 20:19:46 MDT 2010



I have test the checkpoint/restart on the mvapich2 with torque + blcr


I have successfully to qhold the job , but I can not restart the job with


The job's status is to be Q,  and is deferred.


I found the log of torque server give the message:


10/21/2010 09:44:22;0080;PBS_Server;Req;req_reject;Reject reply
code=15057(Cannot execute at specified host because of checkpoint or stagein
files), aux=0, type=RunJob, from maui at demo.hpcc.cuc.edu.cn


I have found the bug in the bugzilla:




Does the new version of torque fix it?


Thank you very much!


chu qiu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20101021/95190ee5/attachment.html 

More information about the torquedev mailing list