[torqueusers] deleting a node crashes the server

Alexander Saydakov saydakov at yahoo-inc.com
Mon Apr 3 17:29:00 MDT 2006


The patch helped. Thanks a lot.


-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of Garrick Staples
Sent: Friday, March 31, 2006 6:01 PM
To: torqueusers at supercluster.org
Subject: Re: [torqueusers] deleting a node crashes the server

On Fri, Mar 31, 2006 at 11:42:55AM -0800, Alexander Saydakov alleged:
> Hi!
> 
>  
> 
> We are running Torque-2.0.0p7 on FreeBSD 4.10 (gcc 2.95)
> 
>  
> 
> Today I tried the following:
> 
> 1.	put a node offline
> 2.	wait until jobs finish
> 3.	qmgr -c 'delete node xxx'
> 
>  
> 
> pbs_server dumped the core:
> 
>  
> 
> Core was generated by `pbs_server'.
> 
> Program terminated with signal 11, Segmentation fault.
> 
> Reading symbols from /usr/lib/libkvm.so.2...done.
> 
> Reading symbols from /usr/lib/libc.so.4...done.
> 
> Reading symbols from /usr/libexec/ld-elf.so.1...done.
> 
> #0  0x1005ec9 in addr_ok (addr=1122282512) at node_func.c:286
> 
> 286           if (pbsndlist[i]->nd_addrs[0] != addr)

Looks like that was fixed in CVS head a few weeks ago.

@@ -283,9 +283,13 @@
       {
       /* NOTE:  should walk thru all nd_addrs for multi-homed hosts */
 
-      if (pbsndlist[i]->nd_addrs[0] != addr)
+      /* NOTE:  deleted node may have already freed nd_addrs */
+
+      if ((pbsndlist[i]->nd_addrs == NULL) ||
(pbsndlist[i]->nd_addrs[0] != addr))
         continue;
 
+      /* node matches addr */
+
       if (pbsndlist[i]->nd_state & (INUSE_DELETED|INUSE_UNKNOWN))
         {
         /* definitely not ok */


-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California



More information about the torqueusers mailing list