TCP uses the
Sliding Window algorithm to build a reliable connection using the unreliable
IP. One feature of the sliding window algorithm is the ability to control the amount of bandwidth used: you do this by varying the window size. A bigger window size allows more data to be transmitted per second, while a smaller window size will reduce the amount.
Choosing a sensible window size is important. If your window size is too small, you don't use the full amount of bandwidth available to you - some of it is wasted. However, having your window size too large is equally bad. If you do this, you can exceed the available bandwidth and end up wasting bandwidth retransmitting dropped packets.
How do you choose the correct window size? This is a harder question than it sounds. The window size should be set to match the amount of available bandwidth, but on a large network like the Internet, this depends on a number of factors:
- The bandwidth of your connection.
- The bandwidth of all other connections between you and your destination.
- How many other people are using your connection or any other connection between you and your destination.
- How many other connections that you, yourself are running.
Clearly this is difficult to estimate, and is made more difficult by the fact that network conditions can be constantly changing.
To deal with the window size problem, TCP stacks include
congestion control algorithms which attempt to determine an appropriate window size for connections.
On the Internet, the bandwidth to different machines can vary drastically (think of talking to someone on a 10Gbps connection and someone else on a
dialup), so these must be calculated separately for different TCP connections.
TCP Reno
The majority of TCP stacks work like this:
- Start with a very small window size
- Increase the window size until packets start being lost
- Drastically reduce the window size and go back to (2).
This is called
TCP Reno and was the first congestion control system to be developed back in the early 90s. The majority of
Operating Systems still use this or some variant. It works fairly well and we haven't had much reason to change it yet (only now, as connections are becoming really fast, are we having problems scaling it up).
Reno is actually two algorithms. The first is the "slow start" algorithm. What this does is to add one packet length to the window size for every packet acknowledged correctly. The effect of this is to double the window size every round trip: as a result, the connection reaches the limit quickly.
The second stage is the "congestion avoidance" algorithm. Once the connection starts dropping packets, this kicks in. The window size at the time is divided in half and saved to a variable called ssthresh. Then slow start is restarted. This time, once the window size reaches ssthresh, it slows down: now the window only increases by one packet per round trip time. The idea here is that the window size approaches the limit slowly.
TCP Vegas
The "TCP
vegas" algorithm is a bit cleverer. It basically models the behaviour of
routers in an attempt to detect congestion better.
Most routers work like this: the router receives some packets. It then puts them on a queue for retransmission. Packets are read off the queue one by one and retransmitted. If the queue becomes full, packets get dropped. Therefore, Reno only detects network congestion when usage has gone well beyond the bandwidth limit.
TCP vegas samples the round trip time for every packet to be acknowledged. If the round trip time is higher than average, it can be assumed that this is probably due to queuing (and hence congestion) at a router along the route to the destination. It throttles the window size back in response to this. Vegas can be seen as a less "drastic" algorithm.
TCP vegas is included in the 2.6 Linux kernel. You may be able to enable it on your system with:
echo 1 > /proc/sys/net/ipv4/tcp_vegas_cong_avoid