Overview

Onion routing is the most fascinating internet innovation I have encountered in months. It is essentially a TCP-anonymizing protocol which functions using a distributed network of computers (called OR-Onion Routers), which run a special open source software called Tor. Tor implements the second generation onion routing scheme, which is a circuit-based, low latency anonymous communication service and provides the user with anonymity while accessing internet services. Let's see what it offers and how it works.


Onion Routing Goal

The main goal of the protocol is to make your internet service usage pretty untraceable. I'll first give a vivid example before I proceed to the technical details. I open my favorite browser and point to http://whatismyip.com. It is a site which all it does is to inform you of your IP address. What I get is: Your IP is 130.37.193.63. I wait for 1 minute. I press the page reload button. What I get is: Your IP is 65.24.109.190. Needless to say that neither of these IPs is my IP.

Quite impressive? Imagine the implications of accessing web services using a different source IP for every request! It would require a great effort of forensics for the sites being accessed to link all the requests back to you, since every request seems to be (actually is) coming from a different host on completely different networks. And due to the design of the anonymizing protocol (it is described below), it would also be extremely difficult even for your network administrator to find out which websites you are accessing. And, as will be shown below, the protocol is generic enough (it uses the SOCKS proxy interface), in order to be able to conceal your identity while using the majority (possibly all) of the available internet services (IRC, ftp, https, ssh, etc).


How Onion Routing works

Let's take a simple case into consideration. I have a linux box with a good proxy installed. I simply tell my browser to use localhost as a proxy at port so and so, for all protocols including SOCKS 5. If I try to access a website, my browser sends the request to the listening proxy and the proxy contacts the website and passes to my browser the results received. Until now, nothing new.

The application layer

Now is the tricky part: We install Tor on our linux box. Tor connects to two central directory servers and reports its IP and port accepting connections. Then the directory servers send to Tor the list of the other hosts running Tor (so that every host running Tor knows all the other hosts running Tor). Now, our linux box is part of the anonymizer network and can relay data to other Tor hosts. Next, we add to our proxy server a directive which makes it pass all web requests to Tor instead of making itself the connection to the web sites. Using privoxy as proxy, this directive would be something like that: "forward-socks4a / localhost:9050 ." In this case, 9050 is the port at which Tor is listening for connections.

The network layer

And now, to the inner workings of Tor. Suppose my Tor server just received a request by me using my browser via the proxy server. I wanted to visit http://www.bossdontlikethissite.com. We must here note that Tor, in order to reduce link establishment overhead, when first started, establishes a few TLS links with various other randomly chosen Tor nodes.

So, my Tor server chooses one of its TLS-connected Tor servers, say node A, and they perform a Diffie-Hellman handshake, agreeing on a symmetric encryption key. Now there's a communication circuit between my Tor server and node A which is then further extended, as my Tor server sends a message (containing a part of a new Diffie-Hellman handshake) to node A, to extend the circuit to another Tor node by specifying its address, say node B. Node A then TLS-connects to node B, and passes to it the new Diffie-Hellman handshake part. So now, my Tor server can communicate with node B, without the intermediate node A being able to know what they are exchanging, and most importantly, without node B knowing which node it is actually communicating with, through node A.

Surely, a topology diagram will help clarify things a little:

__________   TLS    _________   TLS    __________
|        |<---IP--->|       |<---IP--->|        |
| My Tor |          | Node  |          |  Node  |
| Server | Symm. K1 |  A    |          |   B    |
|        |<--TOR--->|_______|          |        |
|        |                             |        |
|        |     Symmetric key K2        |        |
|________|<-----------TOR------------->|________|

Then, my Tor server can likewise extend even more the circuit through nodes A and B, etc. When my Tor server decides that a circuit of say 5 nodes is enough to preserve anonymity, it sends to the last node (the exit node as it is called) the actual web (or whatever internet service) request http://www.bossdontlikethissite.com which then accesses the wanted website and sends the results to my Tor server through all the nodes of the encrypted circuit. My Tor server passes it to the proxy server, and the proxy server presents it to my browser.

In order to reduce the overhead of constructing new circuits all the time, the circuits at each node are re-used to multiplex streams coming from different nodes and expire when they no longer have any open streams, while new ones are created in the background.


Why all this fuss?

Let's see. First of all, all your network administrator can see is encrypted connections from your PC to various IP addresses (the other Tor nodes) having nothing to do with the web sites (or other internet services) you are actually accessing.

The network administrators at the side of the sites you are accessing know nothing about you, since the actual connections are done by the last Tor servers of the circuits (the exit nodes -note: Every Tor server can randomly become an exit node).

And the best part is that anybody, anywhere at the network of one the circuit's nodes cannot see anything but encrypted data being relayed from various seemingly random IPs to various other seemingly random IPs! In addition to that, supposing that one the nodes of the circuit is operated by a malicious CIA agent, still nothing useful can be observed because the circuit is built in such a way that every node of it only knows about the previous hop and the next hop. Even the first hop to which my Tor server communicated cannot say if my Tor server was the originator or was just relaying the request from yet another node. And of course, the actual request (GET HTTP www.boss-does-not-like-this-site.com) though it is passing through all of the nodes of the circuit, only the exit node can decrypt it, see it, and serve it.


Other neat features

First of all, I would like to make it clear that Tor can be used with every application which supports SOCKS proxying, not just web. But even applications which do not support it, can be made with little effort compatible: I use for example Tor to obfuscate my IP address while accessing with SSH remote public machines on the other side of the Atlantic. Full information can be found at the end of the wu.

But what is really cool are the so called hidden services. First an example: I visit with my browser this URL: http://ohdktyycimt33ntp.onion
which is a rather strange URL, if you take into consideration that the .onion domain does not exist. However, some seconds afterwards, I am presented with a web page! Location hidden services allow someone to offer TCP services (say a webserver) without revealing his IP address, and thus his identity. This is performed as follows (say Bob wants to publish the user-manual of the Bomb):

  • First, Bob generates a long-term public key pair to identify his service.
  • Then, Bob selects a couple of Tor servers (called introduction points), and advertises them (not himself!) at the lookup service, signing the advertisement with his public key.
  • Third, Bob builds a circuit to each of his introduction points, and tells them to wait for requests.
  • When Alice learns somehow (by mail, by mouth, by graffiti, etc) about Bob's service (http://ohdktyycimt33ntp.onion), she retrieves Bob's service details from the lookup service and...
  • ...she chooses a random Tor server (to which she build a circuit as described above) as the rendezvous point (RP) for her connection to Bob's service and gives it a randomly chosen "rendezvous cookie" to recognize Bob.
  • Then Alice builds a circuit to one of Bob's service introduction points and gives it a message (encrypted with Bob's public key) telling it about herself, her RP and rendezvous cookie, and the start of a DH handshake. The introduction point sends the message to Bob.
  • If Bob wants to talk to Alice, he builds a circuit to Alice's RP and sends the rendezvous cookie, the second half of the DH handshake, and a hash of the session key they now share.
  • The RP connects Alice's circuit to Bob's. Note that RP can't recognize Alice, Bob, or the data they transmit.
  • Now Alice can access the hidden service of Bob, without knowing whose is or where the service resides.

It seems to me that the protocol, is secure and anonymous to a paranoia level...

Downsides?

Well, yes, just the usual downsides of possible abuse, which every attempt for anonymity inherently carries... Namely, it could be used to send spam (although the default configuration does not let it connect to port 25), and generally since it is an all-purpose anonymizing protocol, it could be used to hide identity for any kind of mischief.


General information to get an idea: http://tor.freehaven.net/documentation.html
The full design: http://tor.freehaven.net/cvs/tor/doc/design-paper/tor-design.html