[torqueusers] pbs_sched crash

Garrick Staples garrick at usc.edu
Wed Mar 22 12:29:05 MST 2006


On Wed, Mar 22, 2006 at 11:04:08AM -0800, Alexander Saydakov alleged:
> #0  0x1013c8e in pbs_rescquery (c=0, resclist=0x9fbff484, num_resc=1,
> available=0x9fbff498, allocated=0x9fbff494, reserved=0x9fbff490,
> down=0x9fbff48c)
> 
>     at ./../Libifl/pbsD_resc.c:218
> 
> 218           *(available + i) = *(reply->brp_un.brp_rescq.brq_avail + i);

Can you check your server logs?  I bet pbs_server was hung on something
causing a timeout in the scheduler's pbs_rescquery() call.


That code looks wrong to me.  I think it should be 'if (pbs_errno == PBSE_NONE)'

  if ((rc = PBS_resc(c,PBS_BATCH_Rescq,resclist,num_resc,(resource_t)0)) != 0)
    {
    return(rc);
    }

  /* read in reply */

  reply = PBSD_rdrpy(c);

  if (rc == PBSE_NONE)
    {
    /* copy in available and allocated numbers */

    for (i = 0;i < num_resc;++i)
      {
      *(available + i) = *(reply->brp_un.brp_rescq.brq_avail + i);

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060322/17ed9e14/attachment.bin


More information about the torqueusers mailing list