[torqueusers] deleting a node crashes the server

Garrick Staples garrick at usc.edu
Fri Mar 31 19:01:18 MST 2006


On Fri, Mar 31, 2006 at 11:42:55AM -0800, Alexander Saydakov alleged:
> Hi!
> 
>  
> 
> We are running Torque-2.0.0p7 on FreeBSD 4.10 (gcc 2.95)
> 
>  
> 
> Today I tried the following:
> 
> 1.	put a node offline
> 2.	wait until jobs finish
> 3.	qmgr -c 'delete node xxx'
> 
>  
> 
> pbs_server dumped the core:
> 
>  
> 
> Core was generated by `pbs_server'.
> 
> Program terminated with signal 11, Segmentation fault.
> 
> Reading symbols from /usr/lib/libkvm.so.2...done.
> 
> Reading symbols from /usr/lib/libc.so.4...done.
> 
> Reading symbols from /usr/libexec/ld-elf.so.1...done.
> 
> #0  0x1005ec9 in addr_ok (addr=1122282512) at node_func.c:286
> 
> 286           if (pbsndlist[i]->nd_addrs[0] != addr)

Looks like that was fixed in CVS head a few weeks ago.

@@ -283,9 +283,13 @@
       {
       /* NOTE:  should walk thru all nd_addrs for multi-homed hosts */
 
-      if (pbsndlist[i]->nd_addrs[0] != addr)
+      /* NOTE:  deleted node may have already freed nd_addrs */
+
+      if ((pbsndlist[i]->nd_addrs == NULL) ||
(pbsndlist[i]->nd_addrs[0] != addr))
         continue;
 
+      /* node matches addr */
+
       if (pbsndlist[i]->nd_state & (INUSE_DELETED|INUSE_UNKNOWN))
         {
         /* definitely not ok */


-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060331/d19eb3fe/attachment.bin


More information about the torqueusers mailing list