[torqueusers] pbs_sched crash
Garrick Staples
garrick at usc.edu
Thu Apr 27 21:47:58 MDT 2006
On Wed, Mar 22, 2006 at 11:04:08AM -0800, Alexander Saydakov alleged:
> Last night pbs_sched crashed leaving our 70+ nodes idle all night long :(
>
> #0 0x1013c8e in pbs_rescquery (c=0, resclist=0x9fbff484, num_resc=1,
> available=0x9fbff498, allocated=0x9fbff494, reserved=0x9fbff490,
> down=0x9fbff48c)
>
> at ./../Libifl/pbsD_resc.c:218
>
> 218 *(available + i) = *(reply->brp_un.brp_rescq.brq_avail + i);
I just checked in this fix for 2.1.0, you can patch your 2.0.0 if you
want. It might even help the memory leak.
Index: src/lib/Libifl/pbsD_resc.c
===================================================================
RCS file:
/usr/local/nfs/src/cvs_repository/torque/src/lib/Libifl/pbsD_resc.c,v
retrieving revision 1.3
diff -u -r1.3 pbsD_resc.c
--- src/lib/Libifl/pbsD_resc.c 23 Mar 2006 02:01:50 -0000 1.3
+++ src/lib/Libifl/pbsD_resc.c 28 Apr 2006 03:44:23 -0000
@@ -209,7 +209,7 @@
reply = PBSD_rdrpy(c);
- if (rc == PBSE_NONE)
+ if (((rc = connection[c].ch_errno) == PBSE_NONE))
{
/* copy in available and allocated numbers */
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060427/fa58275a/attachment.bin
More information about the torqueusers
mailing list