[torqueusers] pbs_server not keeping up

Tony Schreiner anthony.schreiner at bc.edu
Wed Aug 29 10:13:32 MDT 2012


On Aug 29, 2012, at 12:07 PM, Gus Correa wrote:

> On 08/29/2012 11:35 AM, Tony Schreiner wrote:
>> On Aug 29, 2012, at 10:38 AM, Tony Schreiner wrote:
>> 
>>> On my smallish cluster with torque 2.5.7.
>>> 
>>> A user submitted about 8000 jobs to a routing queue, which feeds to an execution queue with 200 runnable slots.
>>> 
>>> At the moment, bps_server is unable to handle it,  pbsnodes returns no nodes found, qstat -q takes a long time and shows nothing.
>>> This is the tail of the latest server_logs file
>>> 
>>> 

…..

>>> 
>>> is there anything I can change to help move things along.
>>> Thanks
>>> 
>>> Tony Schreiner
>> Addendum, it seems to have more to do with the number of entries in the server_priv/jobs directory. There were about 50,000 in there. When I deleted the older ones (about half), operation returned to normal. I'm going to reduce keep_completed, at least temporarily.
>> 
>> Tony
>> 
>> 
> Hi Tony
> 
> Have you tried to set the max_queueable or max_user_queueable attribute 
> of your execution queue?
> http://www.adaptivecomputing.com/resources/docs/torque/2-5-9/4.1queueconfig.php#attributes
> 
> I guess this will throttle the routing-to-execution queue job flux, and 
> reduce the clutter.
> 
> Gus Correa
> _______________________________________________

I mis-spoke earlier. I had set max_user_queuable = 200, not runnable.

Tony Schreiner




More information about the torqueusers mailing list