[torquedev] Jobs remain in queue after process completion in
Torque 2.2
Dave Jackson
jacksond at clusterresources.com
Wed Nov 7 15:25:42 MST 2007
Garrick,
The first check-in referenced, r1514, was written to provide a
workaround to this problem. The failure was introduced some time before
that. My initial review seemed to point to protocol changes revolving
around improving obit handling, possibly associated with the 'qrerun -f'
feature.
Dave
On Wed, 2007-11-07 at 14:03 -0800, Garrick Staples wrote:
> On Wed, Nov 07, 2007 at 10:57:46AM -0700, Douglas Wightman alleged:
> > For my part, I fixed the problem I was seeing in my local testing by
> > changing src/resmom/catch_child.c in "scan_for_exiting".
> >
> > I changed ObitsAllowed to 1 and all the problems went away. It seems
> > that any obits besides the first are ignored by the pbs_server and so
> > the job will never go away without a "qdel -p".
>
> I think you guys will need to grab Dave for this.
>
>
> ------------------------------------------------------------------------
> r1539 | jacksond | 2007-08-31 13:14:38 -0700 (Fri, 31 Aug 2007) | 4 lines
>
> INCR: clean up qsub temp file clean-up
>
>
>
> ------------------------------------------------------------------------
> r1527 | jacksond | 2007-08-17 14:53:27 -0700 (Fri, 17 Aug 2007) | 4 lines
>
> INCR: disable tightly-integrated CPUSet code
>
>
>
> ------------------------------------------------------------------------
> r1515 | jacksond | 2007-08-10 10:55:50 -0700 (Fri, 10 Aug 2007) | 4 lines
>
> INCR: add logging for client timeout
>
>
>
> ------------------------------------------------------------------------
> r1514 | jacksond | 2007-08-10 10:18:03 -0700 (Fri, 10 Aug 2007) | 4 lines
>
> INCR: allow FORCEOBIT env
>
>
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
More information about the torquedev
mailing list