As ApoxyButt says, NP-completeness is one of the toughest concepts to learn in computer science. Unfortunately this means that few people bother to learn about it and that many software analysts lack an important tool in their toolboxes.

If you're not interested in the theory behind NP-completeness, you can skip down to the section headed "In Practice" below. If you're not interested in how NP-completeness helps you design better software you can go back to shaking a dead rooster over your servers...

The Theory

What does "NP-complete" really mean, then? Let's split it up into two parts, "NP" and "complete".

Definition 1: A problem is in NP if it is solvable by a nondeterministic Turing machine in polynomial time.

The term "Nondeterministic" comes from theories about Turing machines and "Polynomial" from Ordo calculus or "Big-Oh notation". Nondeterministic Turing machines are strange beasts and for now we can rephrase the definition as "Verifiable in polynomial time", i.e. we don't necessarily know a good algorithm to calculate the result, but supplied with an answer we can verify it in polynomial time. Another way to look at NP is comparing it to its subset P, i.e. the problems that are solvable in polynomial time. Since NP is a "harder" restriction than P, all problems in P are in NP, but the problems we're most interested in here are those that are in NP but not in P. More on that later.

Now the "Complete" part: The following definition comes from set theory:

Definition 2: The set A is complete for a class of sets L iff A is a member of L and A is hard for L.

That didn't help much, did it? What is meant by "hard"?

Definition 3: The set A is hard for a class of sets L iff every set in L is many-to-one reducible to A.

The reduction function is explained in the many-to-one polynomial-time reducible node and from now on I will use the notation A≤pB to say "A is many-to-one polynomial-time reducible to B". In practice the definition means that we have found a set of functions that reduce every problem set in L to A. More about this can be found in the NP-hard node.

To prove that problem A is NP-complete we have to do two subproofs according to definition 2: We have to prove that A is NP-Hard and that it is in NP, i.e. verifiable in polynomial time.

Finding these proofs may seem to be an almost impossible task at first, but fortunately there are a few theorems that help us out:

Theorem 1: If A≤pB and B is in NP, A is also in NP.

And the most important one:

Theorem 2: If A≤pB, B is in NP and A is NP-Complete B is also NP-Complete.

Using theorem 2, the hardest part of proving that a problem is NP-Complete is finding another NP-complete problem (A) and a polynomial time reduction from it to your problem. There are quite a few problems that are proven to be NP-Complete so you should be able to find one that looks similar to your problem and therefore may be easily reduced to it.

In Practice

So, is there any useful application of all this besides driving CS students insane? Here's an example: A customer wants you to build a software system that will be big and expensive both in development and production. You say "Let me have a look at what the system is supposed to do and your performance requirements. I will be able tell you if the system will be feasible on your platform before we write a single line of code". If that doesn't get the customer's attention I don't know what will.

Saying that a problem is NP-complete is the same as saying that there is no good computer algorithm to solve it. The steps through reduction, hardness and completeness described above reduce the set of problems to the subset containing the problems that are the hardest to solve in terms of complexity. The worst case scenario for an algorithm trying to solve an NP-Complete problem is often an "exhaustive search", i.e. you have to compare every piece of data to every other piece of data to find the solution. The complexity of an exhaustive search is O(n!) which in practice means that the system will quickly reach a performance limit no matter how many hardware upgrades you do. Compare with a problem in P where you know that the required system resouces will grow as a polynomial function of the amount of data processed: If you have to crunch more data you can do that by buying more hardware and you will know roughly how much more you need as well.


The first version of this write-up contained a section that tried to explain why NP-complete problems are special and how they relate to eachother. Unfortunatelly just about everything in that section was entered backwards, flawed and just plain wrong. I'm really sorry if I have muddied the waters for anyone trying to understand what this is all about. It's difficult enough as it is! I'll try to add a new section that hopefully will be correct, if not crystal clear. Many thanks to JerboaKolinowski for pointing out my errors!