[torquedev] [Bug 68] New: Releasing of multi-process checkpointed job fails
bugzilla-daemon at supercluster.org
bugzilla-daemon at supercluster.org
Mon Jul 5 08:53:01 MDT 2010
http://www.clusterresources.com/bugzilla/show_bug.cgi?id=68
Summary: Releasing of multi-process checkpointed job fails
Product: TORQUE
Version: 2.4.x
Platform: PC
OS/Version: Linux
Status: NEW
Severity: major
Priority: P5
Component: pbs_server
AssignedTo: glen.beane at gmail.com
ReportedBy: dsternkopf at hpce.nec.com
CC: torquedev at supercluster.org
Estimated Hours: 0.0
Setup:
o torque 2.4.8
o maui 3.3
o blcr 0.8.2
Please look at ./src/server/req_runjob.c line 1429:
if (strcmp(prun->rq_destin, exec_host) != 0)
This comparison gives for a job submitted with -lnodes=1:ppn=1:
prun->rq_destin: htx5 vs. exec_host: htx5
-> which is okay. Hosts are the same, no failure.
But the same comparison gives for a job sumbitted with -lnodes=1:ppn=2:
prun->rq_destin: htx5:ppn=2 vs. exec_host: htx5
-> which is not the same and gives the failure:
"...allocated nodes must match checkpoint location..."
Regards,
Danny
--
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the torquedev
mailing list