[torquedev] cpu use reporting
garrick at speculation.org
garrick at speculation.org
Wed Jun 14 17:44:43 MDT 2006
On Mon, Jun 12, 2006 at 10:53:08AM -0400, Brock Palen alleged:
> We are seeing strange problems with torque-2.1.0p0 backed by
> maui-3.2.6p14.
> Basically a single cpu is being allocated more than once.
Two requirements for this bug as I saw it. job polling must be enabled
(enabled by default in 2.1.0) and keep_completed must not be enabled.
The job poll task for a given job wasn't being removed after the job
exited. This was causing an extra job_purge() on a job that no longer
exists and causing a mild memory corruption.
Diff below seems to fix and was just applied to CVS.
diff -u -r1.50 -r1.51
--- src/server/pbsd_main.c 12 Jun 2006 16:24:09 -0000 1.50
+++ src/server/pbsd_main.c 14 Jun 2006 23:40:03 -0000 1.51
@@ -1056,6 +1056,8 @@
if (server.sv_attr[(int)SRV_ATR_PollJobs].at_val.at_long &&
(last_jobstat_time + JobStatRate <= time_now))
{
+ struct work_task *ptask;
+
for (pjob = (job *)GET_NEXT(svr_alljobs);
pjob != NULL;
pjob = (job *)GET_NEXT(pjob->ji_alljobs))
@@ -1066,7 +1068,12 @@
when = pjob->ji_wattr[(int)JOB_ATR_qrank].at_val.at_long % JobStatRate;
- set_task(WORK_Timed,when + time_now,poll_job_task,pjob);
+ ptask = set_task(WORK_Timed,when + time_now,poll_job_task,pjob);
+
+ if (ptask)
+ {
+ append_link(&pjob->ji_svrtask,&ptask->wt_linkobj,ptask);
+ }
}
}
More information about the torquedev
mailing list