Complexity classes P and NP

From Wikipedia

HomePage | Recent changes | View source | Discuss this page | Page history | Log in |

Printable version | Disclaimers | Privacy policy

In complexity theory, the class P consists of all those decision problems that can be solved on a deterministic sequential machine in an amount of time that is polynomial in the size of the input; the class NP consists of all those decision problems whose positive solutions can be verified in polynomial time given the right information, or equivalently, whose solution can be found in polynomial time on a non-deterministic machine. The biggest open question in theoretical computer science concerns the relationship between those two classes:

Is P = NP ?

Most people think that the answer is probably "no"; some people believe the question may be undecidable from the currently accepted axioms. A $1,000,000 price has been offered for a correct solution.

In essence, the P = NP question asks: if positive solutions to a YES/NO problem can be verified quickly, can the answers also be computed quickly? Here is an example to get a feeling for the question. Suppose the problem is to decide whether a given natural number with n digits is composite or prime. If the number is composite, the output should be YES, otherwise it should be NO. No fast algorithms are known for this problem, where "fast" means "finishing in a number of steps that is at most a fixed power of n". However, positive answers can be verified quickly given the right information: checking that 69799 = 223 * 313 is much easier than finding the factorization of 69799 in the first place. The information needed to verify a positive answer is also called a certificate. So we conclude that given the right certificates, positive answers to our problem can be verified quickly (i.e. in polynomial time) and that's why this problem is in NP. It is not known whether the problem is in P.

We will now define the two classes formally.

A decision problem is a problem that takes as input some string and requires as output either YES or NO. If there is an algorithm (say a Turing machine, or a LISP or Pascal program with unbounded memory) which is able to produce the correct answer for any input string of length n in at most nk steps, where k is some constant independent of the input string, then we say that the problem can be solved in polynomial time and we place it in the class P. Intuitively, we think of the problems in P as those that can be solved reasonably fast.

Now suppose there is an algorithm A(w,C) which takes two arguments, a string w which is an input string to our decision problem, and a string C which is a "proposed certificate", and such that A produces a YES/NO answer in at most nk steps (where n is the length of w and k doesn't depend on w). Suppose furthermore that

w is a YES instance of the decision problem if and only if there exists C such that A(w,C) returns YES.

Then we say that the problem can be solved in non-deterministic polynomial time and we place it in the class NP. We think of the algorithm A as a verifier of proposed certificates which runs reasonably fast. (Note that the abbreviation NP stands for "Non-deterministic Polynomial" and not for "Non-Polynomial".)

To attack the P = NP question, the concept of NP-completeness is very useful. Informally, the NP-complete problems are the "toughest" problems in NP in the sense that they are the ones most likely not to be in P. This means that if a single NP-complete problem could be shown to be in P, then it would follow that P = NP. Unfortunately, many important problems have been shown to be NP-complete and not a single fast algorithm for any of them is known.

The P = NP question has also been addressed using oracles.

All of the above discussion has assumed that P means "easy" and "not in P" means "hard". While this is a common and reasonably accurate assumption in complexity theory, it is not always true in practice, for several reasons:

  • It ignores constant factors. A problem that takes time 101000n is P (in fact, it's linear time), but is completely intractable in practice. A problem that takes time 10-100002n is not P (in fact, it's exponential time), but is very tractable for values of n up into the thousands.
  • It ignores the size of the exponents. A problem with time n1000 is P, yet intractable. A problem with time 2n/1000 is not P, yet is tractable for n up into the thousands.
  • It only considers worst-case times. There might be a problem that arises in the real world. Most of the time, it can be solved in time n, but on very rare occasions you'll see an instance of the problem that takes time 2n. This problem might have an average time that is polynomial, but the worst case is exponential, so the problem wouldn't be in P.
  • It only considers deterministic solutions. There might be a problem that you can solve quickly if you accept a tiny error probability, but a guaranteed correct answer is much harder to get. The problem would not belong to P even though in practice it can be solved fast. This is in fact a common approach to attack NP-complete problems.
  • New computing models such as quantum computers, which also work probabilistically, may be able to quickly solve some problems not known to be in P.

No one knows whether polynomial-time algorithms exist for NP-complete problems. But if such algorithms do exist, we already know some of them! For example, the following algorithm correctly solves an NP-complete problem, but no one knows how long it takes in general. This is a polynomial-time algorithm if and only if P=NP.

// Algorithm that solves the NP-complete problem SUBSET-SUM.
//
// This algorithm yields positive answers in polynomial time 
// if and only if P=NP.
//
// Input:  S = a finite set of integers
// Output: "YES" if any subset of S adds up to 0.
//         Otherwise, it runs forever with no output.
// Note:  "Program number P" is the program you get by
//         writing the integer P in binary, then
//         considering that string of bits to be a 
//         program.  Every possible program can be
//         generated this way, though most do nothing
//         because of syntax errors.  

FOR N = 1...infinity
    FOR P = 1...N
        Run program number P for N steps with input S
        IF the program outputs a list of distinct integers
            AND the integers are all in S
            AND the integers sum to 0
        THEN
            OUTPUT "YES" and HALT

See also:


/Talk