[torqueusers] Large cluster considerations
Garrick Staples
garrick at usc.edu
Wed Feb 20 14:44:24 MST 2008
On Wed, Feb 20, 2008 at 01:42:50PM -0700, Jerry Smith alleged:
> set server scheduler_iteration = 90
> set server node_ping_rate = 180
> set server node_check_rate = 180
> set server tcp_timeout = 240
> set server job_stat_rate = 120
> set server poll_jobs = True
> set server log_level = 1
(note: node_ping_rate is no longer used)
Here's mine with ~2000 nodes.
set server scheduler_iteration = 60
set server node_check_rate = 200
set server tcp_timeout = 6
set server job_stat_rate = 120
set server poll_jobs = True
Personally, I think increasing tcp_timeout is overrated. If the system is
blocking for more than a few seconds, you have bigger problems. Let
connections fail and let the system recover if it can.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080220/04cc19d0/attachment.bin
More information about the torqueusers
mailing list