The problem

Given a set of cities, and a cost to travel between each pair of cities, determine whether there is a path that visits every city once and returns to the first city, such that the cost travelled is less than some number C.

This problem is often stated such that we need to find the minimum such C possible, but many common formulations of NP-completeness require the problems to be decision problems. In general, if an efficient solution exists for a decision problem, an efficient solution also exists for the corresponding optimization problem, and decision problems are a little bit easier to analyze.

The problem is also often stated such that we need to find the path rather than find the existence of the path. In practice, it's likely that an algorithm that solved this problem would answer yes by actually finding a path, but it's not strictly necessary for the the problem to be considered NP-complete. I'd like to thank jam-master jim for pointing out to me that I made this mistake. Interestingly, this doesn't really change the problem into an optimization problem, since the optimization version of the problem above is to find the minimum C that works. Still, if I wanted to actually use that version of the problem to talk about NP-completeness, I'd have to use a more complicated (but equivalent) definition of NP-complete than is generally used, and it's better to just use the same definition everyone else in the planet agrees on.

NP-Completeness

This problem turns out to be NP-complete, by a polynomial time transformation from the Hamiltonian Circuit (HC) problem. It's a very interesting result, because it's actually slightly stronger than just a result about the NP-completeness of TSP, it also suggests that finding a general polynomial time approximation algorithm for TSP implies that P=NP.

The transformation is pretty simple. In HC, we're given a graph. To transform it into an instance of TSP, let each vertex in the graph be a city. The cost of traveling between two cities will be 1 if there's an edge between those cities in the graph. If no such edge exists, the cost to travel will be m=2n (where n is the number of cities). The question: does there exist a traveling salesman tour with cost n?

If there's a hamiltonian circuit in the original graph, there will be at least one traveling salesman path with cost n (the hamiltonian circuit is one such path.)

Inapproximability of the General Problem

I'll prove that there's no polynomial time approximation algorithm possible (unless P=NP) by contradiction. Assume there's an approximation algorithm for TSP. The best possible approximation algorithm would find a solution with cost at no worse than one more than the optimal solution. In the instance of TSP we've just created, this would be asking if there's a traveling salesman path with cost n+1. Any such path turns out to be the path of cost n (the possible costs of paths in this graph are k+m(n-k), where k is the number of edges that cost 1 in the path. The only way this number can be less than or equal to n+1 is if k=n (otherwise n-k is an integer greater than 0, so the path costs at least m, which is greater than n+1.)

Therefore, if we find an approximation algorithm for TSP, we've found a polynomial time algorithm for solving HC, and P=NP. (It's possible to construct a similar proof about approximation algorithms that find an answer no worse than k*optimal, for any value of k. This is left as an exercise for the reader.)

Approximability of a Special Case

Wait a minute... I just said it's impossible to find an approximation algorithm, and now I'm about to go back on myself. It turns out that if you add more constraints to the problem, it gets easier. For example, if we assume that the triangle inequality holds for the cost function, it's possible to find an algorithm that can find an answer that's less than twice the optimal answer.

For the triangle inequality to hold, that means that if we have three cities a, b, and c, the following equation is true:

cost(a,c) ≤ cost(a,b) + cost(b,c)

In particular, this is true if the costs are the distances between cities (just an interesting fact about Euclidean geometry...) It's also not true about the graphs constructed in the example above, so it's possible that those results don't apply...

It turns out that there are polynomial time algorithms to come up with minimum spanning trees. A minimum spanning tree spans the whole graph (i.e. it contains every vertex in the graph). A traveling salesman circuit can't possibly cost less than the minimum spanning tree on a graph (it has to contain every node, too.)

The algorithm to find a good traveling salesman path is as follows: start with the minimum spanning tree. Start at the root of the tree, and do a depth-first traversal of the tree. If we just left it here, we wouldn't have a possible solution to TSP, because we use each edge in the tree twice. Nevertheless, if this was a good solution, it would cost at most twice the optimal cost (since we're using edges twice, and the cost of these edges is the minimal amount for any tree that connects all vertices of the graph.)

So any time we're about to repeat visiting a node in our traversal, we skip that node, and go to the next node we would have visited. Because of the triangle inequality, the distance we travel by skipping nodes in a path must be less than taking all the nodes. So we can keep taking these shortcuts, and end up with a hamiltonian circuit that costs less than or equal to twice the optimal possible path.

Better approximation algorithms exist, but they're more complicated. Many real-world examples where solving TSP would be nice satisfy the triangle inequality, so this is a useful problem to look at. Unfortunately, many real-world problems don't satisfy the triangle inequality. As one example, it's sometimes possible to fly from city a to city b and then on to city c cheaper than it would have been possible to fly directly from a to c.


References: