[torqueusers] deleting a node crashes the server
Garrick Staples
garrick at usc.edu
Fri Mar 31 19:01:18 MST 2006
On Fri, Mar 31, 2006 at 11:42:55AM -0800, Alexander Saydakov alleged:
> Hi!
>
>
>
> We are running Torque-2.0.0p7 on FreeBSD 4.10 (gcc 2.95)
>
>
>
> Today I tried the following:
>
> 1. put a node offline
> 2. wait until jobs finish
> 3. qmgr -c 'delete node xxx'
>
>
>
> pbs_server dumped the core:
>
>
>
> Core was generated by `pbs_server'.
>
> Program terminated with signal 11, Segmentation fault.
>
> Reading symbols from /usr/lib/libkvm.so.2...done.
>
> Reading symbols from /usr/lib/libc.so.4...done.
>
> Reading symbols from /usr/libexec/ld-elf.so.1...done.
>
> #0 0x1005ec9 in addr_ok (addr=1122282512) at node_func.c:286
>
> 286 if (pbsndlist[i]->nd_addrs[0] != addr)
Looks like that was fixed in CVS head a few weeks ago.
@@ -283,9 +283,13 @@
{
/* NOTE: should walk thru all nd_addrs for multi-homed hosts */
- if (pbsndlist[i]->nd_addrs[0] != addr)
+ /* NOTE: deleted node may have already freed nd_addrs */
+
+ if ((pbsndlist[i]->nd_addrs == NULL) ||
(pbsndlist[i]->nd_addrs[0] != addr))
continue;
+ /* node matches addr */
+
if (pbsndlist[i]->nd_state & (INUSE_DELETED|INUSE_UNKNOWN))
{
/* definitely not ok */
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060331/d19eb3fe/attachment.bin
More information about the torqueusers
mailing list