[torqueusers] torque not scaling well
Garrick Staples
garrick at usc.edu
Wed Aug 1 21:55:09 MDT 2007
On Tue, Jul 31, 2007 at 06:44:12PM -0500, Miles O'Neal alleged:
> log_level = 9
I'd start with turning down the log_level. Logging can really slow things down.
Overall, you are correct that having tons and tons of client connections (qsub)
doesn't scale well. The authentication process with pbs_iff can be
bad.
The real killer is the amount of data that has to be sent from pbs_server to
maui. Each job is a significant chunk of data (especially when users use 'qsub
-V').
Do things like limiting the number of jobs that can actually enter execution
queues. Having a routing queue that initially gets the jobs and routes to the
execution queue with max_user_queueable set to something that is a little more
than the total number of nodes. This will limit the number of jobs that maui
can "see", which means less data is sent across the wire.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070801/75fe8af5/attachment.bin
More information about the torqueusers
mailing list