Bugzilla – Bug 68
Releasing of multi-process checkpointed job fails
Last modified: 2011-01-24 11:33:00 MST
You need to log in before you can comment on or make changes to this bug.
Setup: o torque 2.4.8 o maui 3.3 o blcr 0.8.2 Please look at ./src/server/req_runjob.c line 1429: if (strcmp(prun->rq_destin, exec_host) != 0) This comparison gives for a job submitted with -lnodes=1:ppn=1: prun->rq_destin: htx5 vs. exec_host: htx5 -> which is okay. Hosts are the same, no failure. But the same comparison gives for a job sumbitted with -lnodes=1:ppn=2: prun->rq_destin: htx5:ppn=2 vs. exec_host: htx5 -> which is not the same and gives the failure: "...allocated nodes must match checkpoint location..." Regards, Danny
Changes have been backported from 2.5.5 to fix this issue.