[torqueusers] pbs_sched crash

Garrick Staples garrick at usc.edu
Thu Apr 27 21:47:58 MDT 2006


On Wed, Mar 22, 2006 at 11:04:08AM -0800, Alexander Saydakov alleged:
> Last night pbs_sched crashed leaving our 70+ nodes idle all night long :(
> 
> #0  0x1013c8e in pbs_rescquery (c=0, resclist=0x9fbff484, num_resc=1,
> available=0x9fbff498, allocated=0x9fbff494, reserved=0x9fbff490,
> down=0x9fbff48c)
> 
>     at ./../Libifl/pbsD_resc.c:218
> 
> 218           *(available + i) = *(reply->brp_un.brp_rescq.brq_avail + i);

I just checked in this fix for 2.1.0, you can patch your 2.0.0 if you
want.  It might even help the memory leak.

Index: src/lib/Libifl/pbsD_resc.c
===================================================================
RCS file:
/usr/local/nfs/src/cvs_repository/torque/src/lib/Libifl/pbsD_resc.c,v
retrieving revision 1.3
diff -u -r1.3 pbsD_resc.c
--- src/lib/Libifl/pbsD_resc.c  23 Mar 2006 02:01:50 -0000      1.3
+++ src/lib/Libifl/pbsD_resc.c  28 Apr 2006 03:44:23 -0000
@@ -209,7 +209,7 @@
   
   reply = PBSD_rdrpy(c);

-  if (rc == PBSE_NONE)
+  if (((rc = connection[c].ch_errno) == PBSE_NONE))
     {
     /* copy in available and allocated numbers */



-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060427/fa58275a/attachment.bin


More information about the torqueusers mailing list