P versus NP problem
P versus NP is one of the Millennium Problems, and of great interest to people working with computers and in mathematics. One way of asking it is, "Can every solved problem whose answer can be checked quickly by a computer, also be quickly solved by a computer?" Math problems are referred to as P or NP, whether they are solvable in a given finite set of time with the problem itself being the upper limit. P problems have their solution time bound to a polynomial and so are relatively fast for computers to solve, and so are considered "easy". NP problems are fast (and so "easy") for a computer to check, but are not necessarily easy to solve.
In 1956, Kurt Gödel wrote a letter to John von Neumann. In this letter, Gödel asked whether a certain NP complete problem could be solved in quadratic or linear time.[1] In 1971, Stephen Cook introduced the precise statement of the P versus NP problem in his article "The complexity of theorem proving procedures".[2]
Today, many people consider this problem to be the most important open problem in computer science.[3] It is one of the seven Millennium Prize Problems selected by the Clay Mathematics Institute to carry a US$1,000,000 prize for a solution that invites a published recognition by the Clay Institute, and presumably one(s) that changes the whole of mathematics.
Because many of these problems touch upon related issues, and it is the dream of many mathematicians to invent unifying theories, many hope the Millennium Problems are interconnected.
Clarifications
A computer may be able to tell if an answer is right, but take longer to get the answer. For some interesting, practical questions of this kind, difficult answers are possible to check quickly. So NP problems may be thought of as being like riddles: it may be hard to come up with an answer to a riddle, but once one hears the answer, the answer seems obvious. In this comparison (analogy), the basic question is: are riddles really as hard as we think they are, or are we missing something? Is there a secret to always having an answer?
Because these kinds of P versus NP questions are so practically important, many mathematicians, scientists, and computer programmers want to prove the general proposition, that every quickly-checked problem can also be solved quickly. This question is important enough that the Clay Mathematical Institute will give $1,000,000 to anyone who successfully provides a proof or a valid explanation that disproves it.
Digging a little deeper, we see that all P problems are NP problems: it is easy to check that a solution is correct by solving the problem and comparing the two solutions. However, people want to know about the opposite: Are there any NP problems other than P problems, or are all NP problems just P problems? If NP problems are really not the same as P problems (P ≠ NP), it would mean that no general, fast ways to solve those NP problems can exist, no matter how hard we look. However, if all NP problems are P problems (P = NP), it would mean that new, very fast problem-solving methods do exist. We just have not found them yet.
Since the best efforts of scientists and mathematicians have not found general, easy methods for solving NP problems yet, many people believe that there are NP problems other than P problems (that is, that P ≠ NP is true). Most mathematicians also believe this to be true, but currently no one has proven it by rigorous mathematical analysis. If it can be proven that NP and P are the same (P = NP is true), it would have a huge impact on many aspects of day-to-day life. For this reason, the question of P versus NP is an important and widely studied topic.
Example
Suppose someone wants to build two towers, by stacking rocks of different mass. One wants to make sure that each of the towers has exactly the same mass. That means one will have to put the rocks into two piles that have the same mass. If one guesses a division of the rocks that one thinks will work, it would be easy for one to check if one was right. (To check the answer, one can divide the rocks into two piles, then use a balance to see if they have the same mass.) Because it is easy to check this problem, called 'Partition' by computer scientists—easier than to solve it outright, as we will see—it is not a P problem.
How hard is it to solve, outright? If one starts with just 100 rocks, there are 2^{100-1}-1 = 633,825,300,114,114,700,748,351,602,687, or about 6.3 x 10^{29} possible ways (combinations) to divide these rocks into two piles. If one could check one unique combination of rocks every day, it would take 1.3 x 10^{22} or 1,300,000,000,000,000,000,000 years of effort. For comparison, physicists believe that the universe is about 1.4 x 10^{10} years old (450,000,000,000,000,000 or about 4.5 x 10^{17} seconds, or about one trillionth as old as the time it would take for our rock piling effort. That means that if one takes all of the time that has passed since the beginning of the universe, one would need to check more than two trillion (2,000,000,000,000) different ways of dividing the rocks every second, in order to check all of the different ways.
If one programmed a powerful computer, to test all of these ways to divide the rocks, one might be able to check [math]\displaystyle{ 1,000,000 }[/math] combinations per second using current systems. This means one would still need [math]\displaystyle{ 2,000,000 }[/math] very powerful computers, working since the origin of the universe, to test all the ways of dividing the rocks.
However, it may be possible to find a method of dividing the rocks into two equal piles without checking all combinations. The question "Does P equal NP?" is a shorthand for asking if any method like that can exist.
One algorithm purported to be nearly sufficient involves the following (120-rocks) steps:
// 1) Put 2 equal rocks together in 5 separate groups. // 2) Put 5 unequal rocks together beside the two piles in each group. // 3) Put 5 unequal rocks together beside the 7-piles in each group. // 4) Combine the groups in the 12-piles into a single pile. // 5) Pile up the remaining rocks into a single pile. // 6) Weigh piles.***program runs forever with no output otherwise. // 7) If variance is not acceptable giver-takes or taker-gives a rock. // 8) Repeat 7 and 8 if necessary.
Why it matters
There are many important NP problems that people don't know how to solve in a way that is faster than testing every possible answer. Here are some examples:
- A travelling salesman wants to visit 100 cities by driving, starting and ending his trip at home. He has a limited supply of gasoline, so he can only drive a total of 10,000 kilometers. He wants to know if he can visit all of the cities without running out of gasoline.
- A school offers 100 different classes, and a teacher needs to choose one hour for each class' final exam. To prevent cheating, all of the students who take a class must take the exam for that class at the same time. If a student takes more than one class, then all of those exams must be at a different time. The teacher wants to know if he can schedule all of the exams in the same day so that every student is able to take the exam for each of their classes.
- A farmer wants to take 100 watermelons of different masses to the market. She needs to pack the watermelons into boxes. Each box can only hold 20 kilograms without breaking. The farmer needs to know if 10 boxes will be enough for her to carry all 100 watermelons to market. (This is trivial, if no more than one watermelon weighs more than 2 kg then any 10 can be placed in each of the crates, if no more than ten watermelons weighs more than 2 kg then one of each of them can be placed in each crate, etc., to a fast solution; observation will be the key to any rapid solution such as this or the number set problem).
- A large art gallery has many rooms, and each wall is covered with many expensive paintings. The owner of the gallery wants to buy cameras to watch these paintings, in case a thief tries to steal any of them. He wants to know if 100 cameras will be enough for him to make sure that each painting can be seen by at least one camera.
- The clique problem: The principal of a school has a list of which students are friends with each other. She wants to find a group of 10% of the students that are all friends with each other.
Exponential Time
In the example above, we see that with [math]\displaystyle{ 100 }[/math] rocks, there are [math]\displaystyle{ 2^{100} }[/math] ways to partition the set of rocks. With [math]\displaystyle{ n }[/math] rocks, there are [math]\displaystyle{ 2^n }[/math] combinations. The function [math]\displaystyle{ f(n) = 2^n }[/math] is an exponential function. It's important to NP because it models the worst-case number of computations that are needed to solve a problem and, thus, the worst-case amount of time required.
And so far, for the hard problems, the solutions have required on the order of [math]\displaystyle{ 2^n }[/math] computations. For any particular problem, people have found ways to reduce the number of computations needed. One might figure out that a way to do just 1% of the worst-case number of computation and that saves a lot of computing, but that is still [math]\displaystyle{ 0.01 \times (2^n) }[/math] computations. And every extra rock still doubles the number of computations needed to solve the problem. There are insights that can produce methods to do even fewer computations producing variations of the model: e.g. [math]\displaystyle{ 2^n / n^3 }[/math]. But the exponential function still dominates as [math]\displaystyle{ n }[/math] grows.
Consider the problem of scheduling exams (described above). But suppose, next, that there are 15000 students. There's a computer program that takes the schedules of all 15000 students. It runs in an hour and outputs an exam schedule so that all students can do their exams in one week. It satisfies lots of rules (no back-to-back exams, no more than 2 exams in any 28 hour period, ...) to limit the stress of exam week. The program runs for one hour at mid-term break and everyone knows his/her exam schedule with plenty of time to prepare.
The next year, though, there are 10 more students. If the same program runs on the same computer then that one hour is going to turn into [math]\displaystyle{ 2^{10} }[/math] hours, because every additional student doubles the computations. That's [math]\displaystyle{ 6 }[/math] weeks! If there were 20 more students, then
- [math]\displaystyle{ 2^{20} }[/math] hours = [math]\displaystyle{ 1048576 }[/math] hours ~ [math]\displaystyle{ 43691 }[/math] days ~ [math]\displaystyle{ 113 }[/math] years
Thus, for [math]\displaystyle{ 15000 }[/math] students, it takes one hour. For [math]\displaystyle{ 15020 }[/math] students, it takes [math]\displaystyle{ 113 }[/math] years.
As you can see, exponential functions grow really fast. Most mathematicians believe that the hardest NP problems require exponential time to solve.
NP-complete problems
Mathematicians can show that there are some NP problems that are NP-Complete. An NP-Complete problem is at least as difficult to solve as any other NP problem. This means that if someone found a method to solve any NP-Complete problem quickly, they could use that same method to solve every NP problem quickly. All of the problems listed above are NP-Complete, so if the salesman found a way to plan his trip quickly, he could tell the teacher, and she could use that same method to schedule the exams. The farmer could use the same method to determine how many boxes she needs, and the woman could use the same method to find a way to build her towers.
Because a method that quickly solves one of these problems can solve them all, there are many people who want to find one. However, because there are so many different NP-Complete problems and nobody so far has found a way to solve even one of them quickly, most experts believe that solving NP-Complete problems quickly is not possible.
Basic Properties
In computational complexity theory, the complexity class NP-complete (abbreviated NP-C or NPC), is a class of problems having two properties:
- It is in the set of NP (non-deterministic polynomial time) problems: Any given solution to the problem can be verified quickly (in polynomial time).
- It is also in the set of NP-hard problems: Those which are at least as hard as the hardest problems in NP. Problems that are NP-hard do not have to be elements of NP; indeed, they may not even be decidable.
Formal overview
NP-complete is a subset of NP, the set of all decision problems whose solutions can be verified in polynomial time; NP may be equivalently defined as the set of decision problems solved in polynomial time on a machine. A problem p in NP is also in NPC if and only if every other problem in NP is transformed into p in polynomial time. NP-complete was to be used as an adjective: problems in the class NP-complete were as NP+complete problems.
NP-complete problems are studied because the ability to quickly verify solutions to a problem (NP) seems to correlate with the ability to quickly solve problem (P). It is found every problem in NP is quickly solved—as called the P = NP: problem set. The single problem in NP-complete is solved quickly, faster than every problem in NP also quickly solved, because the definition of an NP-complete problem states every problem in NP must be quickly reducible to every problem in NP-complete (it is reduced in polynomial time). [1]
Examples
The Boolean satisfiability problem is known to be NP complete. In 1972, Richard Karp formulated 21 problems that are known to be NP-complete.[4] These are known as Karp's 21 NP-complete problems. They include problems such as the Integer programming problem, which applies linear programming techniques to the integers, the knapsack problem, or the vertex cover problem.
P Versus NP Problem Media
Euler diagram for P, NP, NP-complete, and NP-hard set of problems (excluding the empty language and its complement, which belong to P but are not NP-complete)
The graph shows the running time vs. problem size for a knapsack problem of a state-of-the-art, specialized algorithm. The quadratic fit suggests that the algorithmic complexity of the problem is O((log(n))2).
Diagram of complexity classes provided that P ≠ NP. The existence of problems within NP but outside both P and NP-complete, under that assumption, was established by Ladner's theorem.
References
- ↑ Juris Hartmanis 1989, Gödel, von Neumann, and the P = NP problem, Bulletin of the European Association for Theoretical Computer Science, vol. 38, pp. 101–107
- ↑ Cook, Stephen (1971). "The complexity of theorem proving procedures". Proceedings of the Third Annual ACM Symposium on Theory of Computing. pp. 151–158.
- ↑ Lance Fortnow, The status of the P versus NP problem, Communications of the ACM 52 (2009), no. 9, pp. 78–86.
- ↑ Richard M. Karp (1972). "Reducibility Among Combinatorial Problems" (PDF). In R. E. Miller; J. W. Thatcher (eds.). Complexity of Computer Computations. New York: Plenum. pp. 85–103.