[torqueusers] torque not scaling well
meo at intrinsity.com
Wed Aug 1 22:04:02 MDT 2007
Garrick Staples said...
|On Tue, Jul 31, 2007 at 06:44:12PM -0500, Miles O'Neal alleged:
|> log_level = 9
|I'd start with turning down the log_level. Logging can really slow things down.
We only turned it up for a while (OK,
several times) to capture debug data
when the problem occurs (or when we
force it for testing). Then we turn
it back off.
|Overall, you are correct that having tons and tons of client connections (qsub)
|doesn't scale well. The authentication process with pbs_iff can be
|The real killer is the amount of data that has to be sent from pbs_server to
|maui. Each job is a significant chunk of data (especially when users use 'qsub
|Do things like limiting the number of jobs that can actually enter execution
|queues. Having a routing queue that initially gets the jobs and routes to the
|execution queue with max_user_queueable set to something that is a little more
|than the total number of nodes. This will limit the number of jobs that maui
|can "see", which means less data is sent across the wire.
We'll look into this. Thanks.
More information about the torqueusers