[torquedev] pbs_server segfault in req_delete.c
Garrick Staples
garrick at usc.edu
Tue Dec 23 16:57:37 MST 2008
On Tue, Dec 23, 2008 at 03:47:38PM -0800, Joshua Bernstein alleged:
> Hello TORQUE Fans!
>
> Remember me? I figured I'd drop one more observed and repeatable
> segfault before we all went on a break for the holidays. This time
> though it seems to be inside of pbs_server. I'm running on X86_64, and
> I've been able to reproduce this problem in both version 2.3.3 and the
> brand shinny new 2.3.6.
>
> Essentially, if you issue a qdel -p to clear the queue of stale
> jobs, pbs_server appears to continue to operate normally, but shortly after
> new jobs get submitted to the queue, pbs_server posts this message and dies.
>
> Assertion failed, bad pointer in link: file "req_delete.c", line 844
> Aborted (core dumped)
While segfaults need to always be fixed, you are using qdel -p incorrectly. It
should only be used if a running job will not exit because its allocated nodes
are unreachable.
qdel -p is a very bad thing to do. It is intentionally breaking pbs_server's
idea of what is going on.
Since you are using qdel -p when you have a running pbs_mom that has the job,
you are bound to have bad things happen.
--
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
See the Dishonor Roll at http://www.californiansagainsthate.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20081223/91353aee/attachment.bin
More information about the torquedev
mailing list