[torqueusers] torque not scaling well

Garrick Staples garrick at usc.edu
Wed Aug 1 21:55:09 MDT 2007


On Tue, Jul 31, 2007 at 06:44:12PM -0500, Miles O'Neal alleged:
>         log_level = 9

I'd start with turning down the log_level.  Logging can really slow things down.

Overall, you are correct that having tons and tons of client connections (qsub)
doesn't scale well.  The authentication process with pbs_iff can be
bad.

The real killer is the amount of data that has to be sent from pbs_server to
maui.  Each job is a significant chunk of data (especially when users use 'qsub
-V').

Do things like limiting the number of jobs that can actually enter execution
queues.  Having a routing queue that initially gets the jobs and routes to the
execution queue with max_user_queueable set to something that is a little more
than the total number of nodes.  This will limit the number of jobs that maui
can "see", which means less data is sent across the wire.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070801/75fe8af5/attachment.bin


More information about the torqueusers mailing list