Bugzilla – Bug 68
Releasing of multi-process checkpointed job fails
Last modified: 2011-01-24 11:33:00 MST
You need to
before you can comment on or make changes to this bug.
o torque 2.4.8
o maui 3.3
o blcr 0.8.2
Please look at ./src/server/req_runjob.c line 1429:
if (strcmp(prun->rq_destin, exec_host) != 0)
This comparison gives for a job submitted with -lnodes=1:ppn=1:
prun->rq_destin: htx5 vs. exec_host: htx5
-> which is okay. Hosts are the same, no failure.
But the same comparison gives for a job sumbitted with -lnodes=1:ppn=2:
prun->rq_destin: htx5:ppn=2 vs. exec_host: htx5
-> which is not the same and gives the failure:
"...allocated nodes must match checkpoint location..."
Changes have been backported from 2.5.5 to fix this issue.