[torqueusers] torque not scaling well

Miles O'Neal meo at intrinsity.com
Wed Aug 1 22:04:02 MDT 2007


Garrick Staples said...

|On Tue, Jul 31, 2007 at 06:44:12PM -0500, Miles O'Neal alleged:
|>         log_level = 9
|
|I'd start with turning down the log_level.  Logging can really slow things down.

We only turned it up for a while (OK,
several times) to capture debug data
when the problem occurs (or when we
force it for testing).  Then we turn
it back off.

|Overall, you are correct that having tons and tons of client connections (qsub)
|doesn't scale well.  The authentication process with pbs_iff can be
|bad.
|
|The real killer is the amount of data that has to be sent from pbs_server to
|maui.  Each job is a significant chunk of data (especially when users use 'qsub
|-V').
|
|Do things like limiting the number of jobs that can actually enter execution
|queues.  Having a routing queue that initially gets the jobs and routes to the
|execution queue with max_user_queueable set to something that is a little more
|than the total number of nodes.  This will limit the number of jobs that maui
|can "see", which means less data is sent across the wire.

We'll look into this.  Thanks.


More information about the torqueusers mailing list