[torquedev] about Bug 68 - Releasing of multi-process checkpointed job fails
Chu Qiu
chqiu at cuc.edu.cn
Wed Oct 20 20:19:46 MDT 2010
Hi!
I have test the checkpoint/restart on the mvapich2 with torque + blcr
I have successfully to qhold the job , but I can not restart the job with
qrls.
The job's status is to be Q, and is deferred.
I found the log of torque server give the message:
10/21/2010 09:44:22;0080;PBS_Server;Req;req_reject;Reject reply
code=15057(Cannot execute at specified host because of checkpoint or stagein
files), aux=0, type=RunJob, from maui at demo.hpcc.cuc.edu.cn
I have found the bug in the bugzilla:
http://www.clusterresources.com/bugzilla/show_bug.cgi?id=68
Does the new version of torque fix it?
Thank you very much!
chu qiu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20101021/95190ee5/attachment.html
More information about the torquedev
mailing list