[torquedev] [Bug 187] New: segfault in job_abt after dealing with array dependencies

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Thu Apr 26 00:04:17 MDT 2012


http://www.clusterresources.com/bugzilla/show_bug.cgi?id=187

           Summary: segfault in job_abt after dealing with array
                    dependencies
           Product: TORQUE
           Version: 3.0.x
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: major
          Priority: P5
         Component: pbs_server
        AssignedTo: dbeer at adaptivecomputing.com
        ReportedBy: rhys.hill at adelaide.edu.au
                CC: torquedev at supercluster.org
   Estimated Hours: 0.0


This code in job_abt:


    if (pjob->ji_wattr[JOB_ATR_depend].at_flags & ATR_VFLAG_SET)
      {
      strcpy(jobid, pjob->ji_qs.ji_jobid);
      depend_on_term(pjob);
      pjob = find_job(jobid);
      }

    /* update internal array bookeeping values */
    if ((pjob->ji_arraystruct != NULL) &&
        (pjob->ji_is_array_template == FALSE))
      {
      ...
    }

is causing a seg fault for us, in torque 4.0.1, r6023, since find_job is
changing pjob to be null, then the following conditional statement crashes.
Strangely, the code within the conditional statement has several checks for
pjob being null, while the condition itself does not. This patch:

Index: src/server/job_func.c
===================================================================
--- src/server/job_func.c    (revision 6023)
+++ src/server/job_func.c    (working copy)
@@ -526,10 +526,14 @@
       strcpy(jobid, pjob->ji_qs.ji_jobid);
       depend_on_term(pjob);
       pjob = find_job(jobid);
+      if (pjob == NULL){
+        log_event(PBSEVENT_JOB, PBS_EVENTCLASS_JOB, jobid, "lost job after
setting up dependencies.");
       }
+      }

     /* update internal array bookeeping values */
-    if ((pjob->ji_arraystruct != NULL) &&
+    if ((pjob != NULL) &&
+        (pjob->ji_arraystruct != NULL) &&
         (pjob->ji_is_array_template == FALSE))
       {
       job_array *pa = get_jobs_array(&pjob);

resolves the crash.

It seems like the code was intended to have this check, but it was
lost/missed/deleted.

-- 
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the torquedev mailing list