[torqueusers] Torque 4.1.4: Deadlock in job creation

Joerg Blank j.blank at fz-juelich.de
Tue Feb 19 12:55:45 MST 2013


Hello,

> Can you provide more details on how to reproduce the second one? 

Unfortunately I can not reproduce the other deadlocks. I just did a core
dump whenever pbs_server broke.

The source for this deadlocks is however rather easy: lock order
violation. You may not lock the global array mutex while holding an
array lock.

I mitigated the deadlocks locally by adding code to release the global
lock for some time on an exponential back off timer.

Regards,
Jörg Blank




More information about the torqueusers mailing list