[torquedev] Jobs remain in queue after process completion in Torque 2.2

Steve Snelgrove ssnelgrove at clusterresources.com
Mon Nov 19 14:05:24 MST 2007


We are proposing to incorporate this suggested workaround
into the code base until a long term solution can be found.
Comments about this would be welcome.

--Steve

Douglas Wightman wrote:
> For my part, I fixed the problem I was seeing in my local testing by
> changing src/resmom/catch_child.c in "scan_for_exiting".  
>
> I changed ObitsAllowed to 1 and all the problems went away.  It seems
> that any obits besides the first are ignored by the pbs_server and so
> the job will never go away without a "qdel -p".
>
> - Douglas
>
>
>
>
> On Wed, 2007-11-07 at 09:42 -0800, Garrick Staples wrote:
>   
>> On Wed, Nov 07, 2007 at 09:34:01AM -0700, Steve Snelgrove alleged:
>>     
>>> It has been noticed that jobs remain in the queue after the process has 
>>> finished.  This seems to be associated with recently checked-in code 
>>> having to do with the packing of multiple obits going to pbs_server.
>>>
>>> If we force the number obits in a packet to one, everything seems to be 
>>> fine.
>>>
>>> Any suggestions about this would be appreciated.
>>>       
>> I'm not recalling anything that matches this description.  What exctly did you
>> change to make it work?
>>
>> _______________________________________________
>> torquedev mailing list
>> torquedev at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torquedev
>>     
>
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
>
>   



More information about the torquedev mailing list