[torquedev] [Bug 181] New: Deadlock in pbsd_init_reque
bugzilla-daemon at supercluster.org
bugzilla-daemon at supercluster.org
Mon Apr 23 22:22:53 MDT 2012
http://www.clusterresources.com/bugzilla/show_bug.cgi?id=181
Summary: Deadlock in pbsd_init_reque
Product: TORQUE
Version: 3.0.x
Platform: PC
OS/Version: Linux
Status: NEW
Severity: major
Priority: P5
Component: pbs_server
AssignedTo: dbeer at adaptivecomputing.com
ReportedBy: rhys.hill at adelaide.edu.au
CC: torquedev at supercluster.org
Estimated Hours: 0.0
pbsd_init_reque currently causes a deadlock on error in torque 4.0.1 r6023. The
code looks like this:
----------------------------------------
pthread_mutex_lock(server.sv_qs_mutex);
if (svr_enquejob(pjob, TRUE, -1) == PBSE_NONE)
{
... Went OK
}
else
{
... Had an error
job_abt(&pjob, logbuf);
/* NOTE: pjob freed but dangling pointer remains */
}
pthread_mutex_unlock(server.sv_qs_mutex);
----------------------------------------
However, the calls within job_abt eventually try to lock sv_qs_mutex, which
obviously fails.
This version is OK:
----------------------------------------
pthread_mutex_lock(server.sv_qs_mutex);
if (svr_enquejob(pjob, TRUE, -1) == PBSE_NONE)
{
strcat(logbuf, msg_init_queued);
strcat(logbuf, pjob->ji_qs.ji_queue);
log_event(
PBSEVENT_SYSTEM | PBSEVENT_ADMIN | PBSEVENT_DEBUG,
PBS_EVENTCLASS_JOB,
pjob->ji_qs.ji_jobid,
logbuf);
pthread_mutex_unlock(server.sv_qs_mutex);
}
else
{
/* Oops, this should never happen */
sprintf(logbuf, "%s; job %s queue %s",
msg_err_noqueue,
pjob->ji_qs.ji_jobid,
pjob->ji_qs.ji_queue);
log_err(-1, "pbsd_init_reque", logbuf);
pthread_mutex_unlock(server.sv_qs_mutex);
job_abt(&pjob, logbuf);
/* NOTE: pjob freed but dangling pointer remains */
}
return;
} /* END pbsd_init_reque() */
----------------------------------------
ie. The unlock call is moved before into both branches of the if statement.
--
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the torquedev
mailing list