If you are given a list of cities and the times it takes to get between any two of them, what is the quickest way to visit every one of them? Solving this is a lot harder than it seems.
Stated mathematically, in a complete weighted graph, what's the tour of least cost?
This problem has been proven to be NP-HARD. That means there's no shortcut to solving it, the only way you can do it is by thinking about all possible routes and picking the shortest one. A problem like "what is the shortest way to get from city A to city B, via another city", given the data above, is much easier.

This problem comes up everywhere - for example, in computer science, if you have many programs to run in any order, all with different setup costs that vary depending on what program was previously running, you have to find the best route between them.
Or if you have a workshop and lots of different jobs to do, but each job takes a different set of tools - you have to decide what order to do the jobs, so you use the least amount of time getting the tools out and prepping the machines.

If we imagine a salesman who is planning to travel to a number of different cities to sell his goods, starting out from, and ending at, his home city, we would also imagine that he would select a route which would minimise the distance travelled. Assuming he knows the distance between each pair of cities on his tour, we can state that he has all the data necessary to plan such a route. However, it is certainly not obvious as to how to use this data to plan said route. This is commonly known as the travelling salesman problem (TSP), and it is the most prominent of the NP-complete optimisation problems.

The first mention of the TSP in literature was made in 1832 in a German book entitled “Der Handlungsreisende, wie er sien soll und was er zu thun hat, um Auftrage zu erhalten und eines glucklichen Erfolgs in seinen Geschaften gewiss zu sein. Von einem alten Commis-Voyageur” (“The Travelling Salesman, how he should be and what he should do to get Commissions and to be Successful in his Business”). The last chapter makes the first explicit reference to the TSP: ‘By a proper choice and scheduling of the tour, one can often gain so much time that we have to make some suggestions... The most important aspect is to cover as many locations as possible without visiting a location twice...’ (Voigt, 1831; Muller-Merbach, 1983).

Although it is not certain who brought the TSP into mathematical scrutiny, it is supposed that Merrill Flood is responsible for bringing it into focus having heard about it from A. W. Tucker in 1937. Flood writes that Tucker heard about the TSP from Hassler Whitney at Princeton University. This seems to make Whitney the founder of the mathematical TSP. The study of the problem, at time of writing, seems to have been just over 60 years; there is still no polynomial-time solution.

Flood was urged in 1948 to popularise the TSP at the RAND Corporation, partly motivated by the purpose of creating intellectual challenges for models outside the theory of games.

The problem

Given a set of cities, and a cost to travel between each pair of cities, determine whether there is a path that visits every city once and returns to the first city, such that the cost travelled is less than some number C.

This problem is often stated such that we need to find the minimum such C possible, but many common formulations of NP-completeness require the problems to be decision problems. In general, if an efficient solution exists for a decision problem, an efficient solution also exists for the corresponding optimization problem, and decision problems are a little bit easier to analyze.

The problem is also often stated such that we need to find the path rather than find the existence of the path. In practice, it's likely that an algorithm that solved this problem would answer yes by actually finding a path, but it's not strictly necessary for the the problem to be considered NP-complete. I'd like to thank jam-master jim for pointing out to me that I made this mistake. Interestingly, this doesn't really change the problem into an optimization problem, since the optimization version of the problem above is to find the minimum C that works. Still, if I wanted to actually use that version of the problem to talk about NP-completeness, I'd have to use a more complicated (but equivalent) definition of NP-complete than is generally used, and it's better to just use the same definition everyone else in the planet agrees on.

NP-Completeness

This problem turns out to be NP-complete, by a polynomial time transformation from the Hamiltonian Circuit (HC) problem. It's a very interesting result, because it's actually slightly stronger than just a result about the NP-completeness of TSP, it also suggests that finding a general polynomial time approximation algorithm for TSP implies that P=NP.

The transformation is pretty simple. In HC, we're given a graph. To transform it into an instance of TSP, let each vertex in the graph be a city. The cost of traveling between two cities will be 1 if there's an edge between those cities in the graph. If no such edge exists, the cost to travel will be m=2n (where n is the number of cities). The question: does there exist a traveling salesman tour with cost n?

If there's a hamiltonian circuit in the original graph, there will be at least one traveling salesman path with cost n (the hamiltonian circuit is one such path.)

Inapproximability of the General Problem

I'll prove that there's no polynomial time approximation algorithm possible (unless P=NP) by contradiction. Assume there's an approximation algorithm for TSP. The best possible approximation algorithm would find a solution with cost at no worse than one more than the optimal solution. In the instance of TSP we've just created, this would be asking if there's a traveling salesman path with cost n+1. Any such path turns out to be the path of cost n (the possible costs of paths in this graph are k+m(n-k), where k is the number of edges that cost 1 in the path. The only way this number can be less than or equal to n+1 is if k=n (otherwise n-k is an integer greater than 0, so the path costs at least m, which is greater than n+1.)

Therefore, if we find an approximation algorithm for TSP, we've found a polynomial time algorithm for solving HC, and P=NP. (It's possible to construct a similar proof about approximation algorithms that find an answer no worse than k*optimal, for any value of k. This is left as an exercise for the reader.)

Approximability of a Special Case

Wait a minute... I just said it's impossible to find an approximation algorithm, and now I'm about to go back on myself. It turns out that if you add more constraints to the problem, it gets easier. For example, if we assume that the triangle inequality holds for the cost function, it's possible to find an algorithm that can find an answer that's less than twice the optimal answer.

For the triangle inequality to hold, that means that if we have three cities a, b, and c, the following equation is true:

cost(a,c) ≤ cost(a,b) + cost(b,c)

In particular, this is true if the costs are the distances between cities (just an interesting fact about Euclidean geometry...) It's also not true about the graphs constructed in the example above, so it's possible that those results don't apply...

It turns out that there are polynomial time algorithms to come up with minimum spanning trees. A minimum spanning tree spans the whole graph (i.e. it contains every vertex in the graph). A traveling salesman circuit can't possibly cost less than the minimum spanning tree on a graph (it has to contain every node, too.)

The algorithm to find a good traveling salesman path is as follows: start with the minimum spanning tree. Start at the root of the tree, and do a depth-first traversal of the tree. If we just left it here, we wouldn't have a possible solution to TSP, because we use each edge in the tree twice. Nevertheless, if this was a good solution, it would cost at most twice the optimal cost (since we're using edges twice, and the cost of these edges is the minimal amount for any tree that connects all vertices of the graph.)

So any time we're about to repeat visiting a node in our traversal, we skip that node, and go to the next node we would have visited. Because of the triangle inequality, the distance we travel by skipping nodes in a path must be less than taking all the nodes. So we can keep taking these shortcuts, and end up with a hamiltonian circuit that costs less than or equal to twice the optimal possible path.

Better approximation algorithms exist, but they're more complicated. Many real-world examples where solving TSP would be nice satisfy the triangle inequality, so this is a useful problem to look at. Unfortunately, many real-world problems don't satisfy the triangle inequality. As one example, it's sometimes possible to fly from city a to city b and then on to city c cheaper than it would have been possible to fly directly from a to c.


References:

Log in or register to write something here or to contact authors.