[torquedev] pbs_server segfault in req_delete.c

Joshua Bernstein jbernstein at penguincomputing.com
Mon Dec 29 12:03:36 MST 2008



Garrick Staples wrote:
> On Wed, Dec 24, 2008 at 12:23:33AM -0500, Michel Béland alleged:
>> Garrick Staples wrote:
>>
>>> While segfaults need to always be fixed, you are using qdel -p 
>>> incorrectly.  It
>>> should only be used if a running job will not exit because its allocated 
>>> nodes
>>> are unreachable.
>>>
>>> qdel -p is a very bad thing to do.  It is intentionally breaking 
>>> pbs_server's
>>> idea of what is going on.  
>>>
>>> Since you are using qdel -p when you have a running pbs_mom that has the 
>>> job,
>>> you are bound to have bad things happen.
>> That is probably true, I will trust you on that, but how to get rid of a 
>> job that is stuck in the E state for days?
> 
> The solution to that will be on the node.  Do you know why it is stuck?  What
> is it waiting for?

I know why the job is stuck. I made it get stuck in that phase on 
purpose in order to recreate an issue a few customers reported to me.

-Joshua Bernstein
Software Engineer
Penguin Computing


More information about the torquedev mailing list