[torqueusers] Torque 4.1.4: Deadlock in job creation

Ezell, Matthew A. ezellma at ornl.gov
Wed Feb 20 14:04:43 MST 2013


On 2/20/13 3:48 PM, "Joerg Blank" <j.blank at fz-juelich.de> wrote:


>
>> The source for this deadlocks is however rather easy: lock order
>> violation. You may not lock the global array mutex while holding an
>> array lock.
>
>You may want to try: https://lwn.net/Articles/536363/
>
>Regards,
>Joerg Blank


There's also Helgrind, part of the Valgrind suite:

http://valgrind.org/docs/manual/hg-manual.html

I did some initial work and even submitted some patches to add support for
running it:

https://github.com/adaptivecomputing/torque/pull/2

~Matt

---
Matt Ezell
HPC Systems Administrator
Oak Ridge National Laboratory



More information about the torqueusers mailing list