[torquedev] [Bug 181] New: Deadlock in pbsd_init_reque

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Mon Apr 23 22:22:53 MDT 2012


http://www.clusterresources.com/bugzilla/show_bug.cgi?id=181

           Summary: Deadlock in pbsd_init_reque
           Product: TORQUE
           Version: 3.0.x
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: major
          Priority: P5
         Component: pbs_server
        AssignedTo: dbeer at adaptivecomputing.com
        ReportedBy: rhys.hill at adelaide.edu.au
                CC: torquedev at supercluster.org
   Estimated Hours: 0.0


pbsd_init_reque currently causes a deadlock on error in torque 4.0.1 r6023. The
code looks like this:

----------------------------------------

  pthread_mutex_lock(server.sv_qs_mutex);
  if (svr_enquejob(pjob, TRUE, -1) == PBSE_NONE)
    {
    ... Went OK
    }
  else
    {
    ... Had an error

    job_abt(&pjob, logbuf);

    /* NOTE:  pjob freed but dangling pointer remains */
    }
  pthread_mutex_unlock(server.sv_qs_mutex);

----------------------------------------

However, the calls within job_abt eventually try to lock sv_qs_mutex, which
obviously fails.

This version is OK:

----------------------------------------

  pthread_mutex_lock(server.sv_qs_mutex);
  if (svr_enquejob(pjob, TRUE, -1) == PBSE_NONE)
    {
    strcat(logbuf, msg_init_queued);
    strcat(logbuf, pjob->ji_qs.ji_queue);

    log_event(
      PBSEVENT_SYSTEM | PBSEVENT_ADMIN | PBSEVENT_DEBUG,
      PBS_EVENTCLASS_JOB,
      pjob->ji_qs.ji_jobid,
      logbuf);
      pthread_mutex_unlock(server.sv_qs_mutex);
    }
  else
    {
    /* Oops, this should never happen */

    sprintf(logbuf, "%s; job %s queue %s",
            msg_err_noqueue,
            pjob->ji_qs.ji_jobid,
            pjob->ji_qs.ji_queue);

    log_err(-1, "pbsd_init_reque", logbuf);

    pthread_mutex_unlock(server.sv_qs_mutex);

    job_abt(&pjob, logbuf);

    /* NOTE:  pjob freed but dangling pointer remains */
    }
  return;
  }  /* END pbsd_init_reque() */

----------------------------------------

ie. The unlock call is moved before into both branches of the if statement.

-- 
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the torquedev mailing list