[torquedev] about Bug 68 - Releasing of multi-process checkpointed job fails
chqiu at cuc.edu.cn
Wed Oct 20 20:19:46 MDT 2010
I have test the checkpoint/restart on the mvapich2 with torque + blcr
I have successfully to qhold the job , but I can not restart the job with
The job's status is to be Q, and is deferred.
I found the log of torque server give the message:
10/21/2010 09:44:22;0080;PBS_Server;Req;req_reject;Reject reply
code=15057(Cannot execute at specified host because of checkpoint or stagein
files), aux=0, type=RunJob, from maui at demo.hpcc.cuc.edu.cn
I have found the bug in the bugzilla:
Does the new version of torque fix it?
Thank you very much!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torquedev