[torqueusers] Re: moms clearing their own offline status

Garrick Staples garrick at usc.edu
Fri Oct 29 14:31:16 MDT 2004


On Fri, Oct 29, 2004 at 12:38:10PM -0700, Garrick Staples alleged:
> Torque/Maui is getting so good at solving all of the bigger issues, I'm
> starting to drill down into the smaller annoying ones :)
> 
> This has been bugging me for a long time now, but I've only finally figured out
> to reproduce it.  I've always noticed that sometimes when I boot a node that
> was marked offline, it will have the status cleared when pbs_mom starts.
> 
> Today I found that I can repeat it 100%.  It only happens when pbs_mom wasn't
> shutdown cleanly or pbs_server was unreachable when it was shutdown.  You can
> either bring down networking, crash the machine, or kill -9 pbs_mom, and the
> mom will always be online again when it starts up.

I think I found it.  This code kicks in when a mom starts up, but server still
has a valid connection entry.  Instead of just setting state unknown, it should
preserve the offline state.


diff -ruN torque-1.1.0p4_orig/src/server/node_manager.c torque-1.1.0p4/src/server/node_manager.c
--- torque-1.1.0p4_orig/src/server/node_manager.c	2004-10-28 15:50:48.000000000 -0700
+++ torque-1.1.0p4/src/server/node_manager.c	2004-10-29 13:28:06.000000000 -0700
@@ -873,7 +873,14 @@
 
       tdelete((u_long)node->nd_stream,&streams);
  
-      node->nd_state = INUSE_UNKNOWN;
+      if (node->nd_state & INUSE_OFFLINE)
+        {
+        node->nd_state = (INUSE_UNKNOWN|INUSE_OFFLINE);
+        }
+      else
+        {
+        node->nd_state = INUSE_UNKNOWN;
+        }
       node->nd_stream = -1;
 
       /* do a ping in 5 seconds */
-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041029/a52ca139/attachment.bin


More information about the torqueusers mailing list