The exponential increases in internet use over the last decade have meant that ever-increasing amounts of traffic have been generated for busy websites. The problem of investing capital in website infrastructure is a complex one as the network capacity must be able to cope with peaks that may occasionally be far in excess of those experienced on a daily basis. An example of this being the extra traffic to Microsoft when the Blaster worm started spreading and every Windows user wanted the security patch. Adding to this uncertainty over the amount of required infrastructure is a ‘second-wave’ of internet growth caused by broadband technologies becoming widespread in Western nations. These technologies will eliminate the so-called ‘last milebottlenecks on the internet, which are caused by the user’s connection to the ISP being much slower than the rest of the internet’s infrastructure.

All of this means that bottlenecks will be more likely to exist along internet backbones and between the various networks that comprise the internet. One solution to these problems is simply to invest in more hardware but this presents problems as the rate at which network hardware can handle data is growing much more slowly than the requirements of users. Some sources estimate that 90% of web traffic is generated by rich media content such as images, video and audio. An innovative solution to reducing bandwidth requirements has been presented by Content Distribution Network (CDN, also called Content Delivery Network) technologies.

These technologies rely on having a widely distributed network of content servers that a company can then redirect traffic to for some or all of the content of the website. These content servers are not owned by the companies for which they serve content: one company will pay another for their use. The focus of a CDN is to optimise the process of directing traffic to an appropriate server. The market leading CDN is provided by Akamai. Their technology works by having a script on the company’s website rewrite URL’s in a special form so that the DNS lookup will direct the user’s browser to an optimal server. This server is selected by a combination of geographical proximity on the network as well as by measuring the load and latency of the server.

The process of directing the DNS is achieved by a group of DNS servers. For this write-up, we will use the Akamai approach as an example.

When the client’s local DNS does not have a cached address for an Akamai URL it will ask a top-level .net DNS for an address. This address will direct the lookup to a high-level Akamai DNS, which will in turn redirect the user to a lower-level Akamai DNS. Choosing which server to redirect to is based on a metric recalculated every 20 minutes. The low-level DNS will redirect to a content server close to the user’s ISP, from which the actual content will be served. The low-level DNS will recalculate its mapping every 20 seconds to ensure that individual content servers are available and not overloaded.

The user’s ISP will have a DNS that will try to cache this IP for the URL but Akamai give the addresses very short Time To Live (TTL) values of about 20 seconds. This prevents one ISP continuously passing all its users to one content server.

The approach used by Akamai is very similar to that used by other CDN’s such as Speedera and Adero. However, research carried out by AT&T Labs in collaboration with Worcester Polytechnic, Massachusetts suggests that this technique of DNS redirection is just as likely to reduce download speed as increase it and the increased overhead of the extra DNS lookups means that a low IP cache time is actually a bad idea. In light of this research, Akamai may have altered the configuration of their networks. However, even with these findings, the fact is that using a CDN does increase the apparent speed of the website at the user end and the pooling of hardware resources engendered by this approach means reduced costs for websites.

This research also verifies the assumption that a larger number of servers distributed over a wide geographical area is beneficial to the performance of the CDN as a whole. This means that content is only sent to the content servers once and will then have less distance to travel from there to the many users who will access that content server. Many companies refer to this process as moving content to the ‘edge’ of the internet and thus conserving bandwidth at the centre, on the backbones.

The structure of the Akamai CDN is such that only those files referenced by ‘src’ attributes within HTML tags will be cached by the content servers. The thinking behind this is that it allows websites to generate the text of a page dynamically while having the static content, such as images, cached close to the user.

However, the increasing use of dynamic page generation technologies such as Java Server Pages (JSP) and Active Server Pages (ASP) means that companies must invest in faster servers to generate large amounts of these pages. A CDN-based solution to this problem is the technology of Edge-Side Includes (ESI).

ESI allows a web page to be defined in terms of ‘fragments’ of code that can be tagged with individual cache times. These fragments of code can then be assembled by content servers into coherent pages. This approach means that static content, such as logos and menus, can be cached for a long time while dynamically generated content, such as stock figures or inventory information, will be kept for a short (possibly zero) time in the content server’s cache. Mechanisms also exist for the website’s central server to send out invalidate messages to content servers that will cause them to re-fetch certain fragments.

This technology also introduces improved fault tolerance into the network as pages can be defined with a default fragment that can be included should the central page generation facility be unavailable. ESI is being worked on as an open standard by many technology companies and is currently available in software by Akamai and Oracle.

The ideas behind CDN reflect a growing trend in internet technologies towards a more distributed paradigm, which can also be seen in peer-to-peer file sharing and grid computing. Overall, the Content Distribution Network approach has improved website speed and reliability for many companies and the advantage of this outsourcing of bandwidth and hardware requirements is reduced costs in hardware for customer companies.

A node your homework node


References
  • Fast Internet Content Delivery with Freeflow, http://www.cs.washington.edu/homes/ratul/akamai/freeflow.pdf
  • Turning Web Data Into Intelligence – What do you know about your website visitors?, Top 10 Critical Web Site Analysis Reports, http://www.akamai.com/en/resources/pdf/10_reports.pdf
  • Internet Bottlenecks, The Case for Edge Delivery Services, http://www.akamai.com/en/resources/pdf/BottlenecksWhitepaper1.pdf
  • Why Performance Matters, http://www.akamai.com/en/resources/pdf/performance_WP_mar2002.pdf
  • Akamai Streaming – When Performance Matters, http://www.akamai.com/en/resources/pdf/Streaming_Akamai.pdf
  • Sheikh, A., Content Distribution Networks, http://www.balmelli.net/seminars/seminar2.html
  • Krishnamurphy B., Wills C. & Zhang Y., On the Use and Performance of Content Distribution Networks, http://www.icir.org/vern/imw-2001/imw2001-papers/10.pdf
  • ESI – Accelerating E-business Applications, http://www.esi.org
  • Kontiki, http://www.kontiki.com
  • Akamai, http://www.akamai.com