[torqueusers] Large cluster considerations

Garrick Staples garrick at usc.edu
Wed Feb 20 14:44:24 MST 2008


On Wed, Feb 20, 2008 at 01:42:50PM -0700, Jerry Smith alleged:
> set server scheduler_iteration = 90
> set server node_ping_rate = 180
> set server node_check_rate = 180
> set server tcp_timeout = 240
> set server job_stat_rate = 120
> set server poll_jobs = True
> set server log_level = 1

(note: node_ping_rate is no longer used)

Here's mine with ~2000 nodes.

set server scheduler_iteration = 60
set server node_check_rate = 200
set server tcp_timeout = 6
set server job_stat_rate = 120
set server poll_jobs = True


Personally, I think increasing tcp_timeout is overrated.  If the system is
blocking for more than a few seconds, you have bigger problems.  Let
connections fail and let the system recover if it can.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080220/04cc19d0/attachment.bin


More information about the torqueusers mailing list