[torqueusers] pbs_sched crash
Alexander Saydakov
saydakov at yahoo-inc.com
Fri Apr 28 10:46:08 MDT 2006
Wonderful. Thanks. I will give it a try.
-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of Garrick Staples
Sent: Thursday, April 27, 2006 8:48 PM
To: torqueusers at supercluster.org
Subject: Re: [torqueusers] pbs_sched crash
On Wed, Mar 22, 2006 at 11:04:08AM -0800, Alexander Saydakov alleged:
> Last night pbs_sched crashed leaving our 70+ nodes idle all night long :(
>
> #0 0x1013c8e in pbs_rescquery (c=0, resclist=0x9fbff484, num_resc=1,
> available=0x9fbff498, allocated=0x9fbff494, reserved=0x9fbff490,
> down=0x9fbff48c)
>
> at ./../Libifl/pbsD_resc.c:218
>
> 218 *(available + i) = *(reply->brp_un.brp_rescq.brq_avail + i);
I just checked in this fix for 2.1.0, you can patch your 2.0.0 if you
want. It might even help the memory leak.
Index: src/lib/Libifl/pbsD_resc.c
===================================================================
RCS file:
/usr/local/nfs/src/cvs_repository/torque/src/lib/Libifl/pbsD_resc.c,v
retrieving revision 1.3
diff -u -r1.3 pbsD_resc.c
--- src/lib/Libifl/pbsD_resc.c 23 Mar 2006 02:01:50 -0000 1.3
+++ src/lib/Libifl/pbsD_resc.c 28 Apr 2006 03:44:23 -0000
@@ -209,7 +209,7 @@
reply = PBSD_rdrpy(c);
- if (rc == PBSE_NONE)
+ if (((rc = connection[c].ch_errno) == PBSE_NONE))
{
/* copy in available and allocated numbers */
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
More information about the torqueusers
mailing list