Bugzilla – Bug 181
Deadlock in pbsd_init_reque
Last modified: 2012-04-23 22:22:52 MDT
You need to log in before you can comment on or make changes to this bug.
pbsd_init_reque currently causes a deadlock on error in torque 4.0.1 r6023. The code looks like this: ---------------------------------------- pthread_mutex_lock(server.sv_qs_mutex); if (svr_enquejob(pjob, TRUE, -1) == PBSE_NONE) { ... Went OK } else { ... Had an error job_abt(&pjob, logbuf); /* NOTE: pjob freed but dangling pointer remains */ } pthread_mutex_unlock(server.sv_qs_mutex); ---------------------------------------- However, the calls within job_abt eventually try to lock sv_qs_mutex, which obviously fails. This version is OK: ---------------------------------------- pthread_mutex_lock(server.sv_qs_mutex); if (svr_enquejob(pjob, TRUE, -1) == PBSE_NONE) { strcat(logbuf, msg_init_queued); strcat(logbuf, pjob->ji_qs.ji_queue); log_event( PBSEVENT_SYSTEM | PBSEVENT_ADMIN | PBSEVENT_DEBUG, PBS_EVENTCLASS_JOB, pjob->ji_qs.ji_jobid, logbuf); pthread_mutex_unlock(server.sv_qs_mutex); } else { /* Oops, this should never happen */ sprintf(logbuf, "%s; job %s queue %s", msg_err_noqueue, pjob->ji_qs.ji_jobid, pjob->ji_qs.ji_queue); log_err(-1, "pbsd_init_reque", logbuf); pthread_mutex_unlock(server.sv_qs_mutex); job_abt(&pjob, logbuf); /* NOTE: pjob freed but dangling pointer remains */ } return; } /* END pbsd_init_reque() */ ---------------------------------------- ie. The unlock call is moved before into both branches of the if statement.