[torquedev] pbs_server segfault in req_delete.c

Garrick Staples garrick at usc.edu
Tue Dec 23 16:57:37 MST 2008


On Tue, Dec 23, 2008 at 03:47:38PM -0800, Joshua Bernstein alleged:
> Hello TORQUE Fans!
> 
> 	Remember me? I figured I'd drop one more observed and repeatable 
> segfault before we all went on a break for the holidays. This time 
> though it seems to be inside of pbs_server. I'm running on X86_64, and 
> I've been able to reproduce this problem in both version 2.3.3 and the 
> brand shinny new 2.3.6.
> 
> 	Essentially, if you issue a qdel -p to clear the queue of stale 
> 	jobs, pbs_server appears to continue to operate normally, but shortly after 
> new jobs get submitted to the queue, pbs_server posts this message and dies.
> 
> Assertion failed, bad pointer in link: file "req_delete.c", line 844
> Aborted (core dumped)

While segfaults need to always be fixed, you are using qdel -p incorrectly.  It
should only be used if a running job will not exit because its allocated nodes
are unreachable.

qdel -p is a very bad thing to do.  It is intentionally breaking pbs_server's
idea of what is going on.  

Since you are using qdel -p when you have a running pbs_mom that has the job,
you are bound to have bad things happen.

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

See the Dishonor Roll at http://www.californiansagainsthate.com/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20081223/91353aee/attachment.bin


More information about the torquedev mailing list