[torquedev] about Bug 68 - Releasing of multi-process checkpointed job fails

Chu Qiu chqiu at cuc.edu.cn
Wed Oct 20 20:19:46 MDT 2010


Hi!

 

I have test the checkpoint/restart on the mvapich2 with torque + blcr

 

I have successfully to qhold the job , but I can not restart the job with
qrls.

 

The job's status is to be Q,  and is deferred.

 

I found the log of torque server give the message:

 

10/21/2010 09:44:22;0080;PBS_Server;Req;req_reject;Reject reply
code=15057(Cannot execute at specified host because of checkpoint or stagein
files), aux=0, type=RunJob, from maui at demo.hpcc.cuc.edu.cn

 

I have found the bug in the bugzilla:

 

http://www.clusterresources.com/bugzilla/show_bug.cgi?id=68

 

Does the new version of torque fix it?

 

Thank you very much!

 

chu qiu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20101021/95190ee5/attachment.html 


More information about the torquedev mailing list