[torquedev] [Bug 188] New: job log deadlock

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Sun Apr 29 05:21:17 MDT 2012


http://www.clusterresources.com/bugzilla/show_bug.cgi?id=188

           Summary: job log deadlock
           Product: TORQUE
           Version: 3.0.x
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: critical
          Priority: P5
         Component: pbs_server
        AssignedTo: dbeer at adaptivecomputing.com
        ReportedBy: rhys.hill at adelaide.edu.au
                CC: torquedev at supercluster.org
   Estimated Hours: 0.0


There is currently a deadlock that commonly occurs when job logging is enabled.

The deadlock occurs because the function mk_job_log_name locks job_log_mutex to
update the time when the log was opened, even though the lock is already take
every time its only caller, job_log_open, is executed. The problem is fixed by
simply removing the lock:

Index: src/lib/Liblog/pbs_log.c
===================================================================
--- src/lib/Liblog/pbs_log.c    (revision 6023)
+++ src/lib/Liblog/pbs_log.c    (working copy)
@@ -272,9 +272,7 @@
             ptm->tm_mday);
     }

-  pthread_mutex_lock(job_log_mutex);
   joblog_open_day = ptm->tm_yday; /* Julian date log opened */
-  pthread_mutex_unlock(job_log_mutex);

   return(pbuf);
   }  /* END mk_job_log_name() */

the structure of the code then matches mk_log_name.

-- 
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the torquedev mailing list