[torqueusers] Torque 4.1.4: Deadlock in job creation

Joerg Blank j.blank at fz-juelich.de
Tue Feb 19 12:55:45 MST 2013


> Can you provide more details on how to reproduce the second one? 

Unfortunately I can not reproduce the other deadlocks. I just did a core
dump whenever pbs_server broke.

The source for this deadlocks is however rather easy: lock order
violation. You may not lock the global array mutex while holding an
array lock.

I mitigated the deadlocks locally by adding code to release the global
lock for some time on an exponential back off timer.

Jörg Blank

More information about the torqueusers mailing list