If we are prepared to accept that every number greater than one has a prime divisor, then Euclid’s famous proof of the infinitude of prime numbers is one sentence long: Given any prime number p, if I multiply together all the prime numbers less than or equal to p and then add one, I get a number which leaves a remainder of one when divided by any of the prime numbers less than or equal to p and whence has a prime divisor greater than p. I try to explain why every number greater than one has a prime divisor in this post.

Today I would like to think a little bit more about prime numbers. Specifically, we will spend some time thinking about the number of prime numbers less than a given a number. We will start by seeing if we can get any quantitative information out of Euclid’s proof itself, before moving on to cleverer ways of achieving this.


We will write p_n for the n^{th} prime number. We will write q_n for the number which is constructed in Euclid’s proof, i.e. q_n is one more than the product of the first n prime numbers. Given p_n, Euclid’s proof not only shows the existence of a larger prime p_{n+1}, but it gives an upper bound on p_{n+1}; it gives an estimate as to how large the next prime can possibly be: It says that p_{n+1} < q_n , since p_{n+1} is some non-trivial divisor of q_n.

Since q_n is one more than the product of the first n primes, each of which is at most p_n, it is true that q_n \leq p_n^n + 1, and so p_{n+1} < p_n^n + 1. This is an estimate on the rate of growth of the function n \mapsto p_n. It is actually a very poor estimate. For example, it says that the third prime number, which is five, is less than twenty-eight. It says that the fourth prime number, which is seven, is less than two hundred and fifty-seven. Obviously this estimate is a long way off. We ought to have expected this estimate to be bad, though. Firstly, Euclid’s proof is rather crude: The proof exhibits a prime larger than p_n by finding a number which is not divisible by any of p_1, p_2,... p_n, it makes no attempt to find the first number after p_n which is not divisible by any of p_1, p_2,... p_n . Secondly, we threw away a lot of information when we made the bound at the begining of this paragraph: We said, “all of the first n primes are less than or equal to p_n“, but if n is very large, this estimate is terrible. It replaces small numbers like two, three and five by a very large number. So, in conclusion, this approach is pretty bad.

Any Better?

Let’s try again with a slighly different approach. The key observation here is that `Euclid’s bound’ p_{n+1} < q_n lends itself to an inductive argument because q_n depends only on the first n primes. To be more explicit, suppose there is some function b(n), for which the following holds: Firstly, $latex  b(1)$ is greater than or equal to two. This would allow us to start the induction. Secondly, if p_n \leq b(n) for all n \leq N, then p_{N + 1} \leq b(N+1). This is the inductive step. Euclid’s bound would come into play in the inductive step: Since p_{N+1} \leq q_N, in order to show that p_{N + 1} \leq b(N+1), it suffices to show that q_N \leq b(N+1). The result of completing the induction would be that p_n \leq b(n) for all n. Let’s look at the inductive step in more detail to see what kind of functions we can get away with for b: If all we know is that p_n \leq b(n) for all n \leq N, then we have that

q_N = \Pi_{j=1}^Np_j + 1 \leq \Pi_{j=1}^Nb(j) + 1 .

The inequality we seek will therefore follow if b satisfies

\Pi_{j=1}^Nb(j) + 1 \leq b(N+1).

Unfortunately, even this more careful analysis has not yeilded a very good bound. Functions which satisfy such an inequality must grow pretty fast. Suppose, for example that b grows at least exponentially fast, i.e. that b(n) = c^{B(n)} for some constant c and some other function B, which is increasing with at least linear growth. Then the inequality above reads

c^{B(1) + B(2) + ... + B(N)} + 1 \leq c^{B(N+1)}.

So, approximately speaking (by ignoring the one and taking logarithms), we must have B(1) + B(2) + ... + B(N) \leq B(N+1), which, if you’ll take my word for it, suggests exponential growth of B (because the sum of the first N k^{th} powers is a polynomial of degree k+1). So the original function b must grow very fast. For example, the function b(n) = 2^{2^n} works.

The advantage of this result is that the bound can be calculated easily. The disadvantage is, well, that it is still a terrible estimate. It says that the second prime number, which is three, is at most sixteen and that the third prime number, which is five, is at most two hundred and fifty-six. Actually, if we write \pi(x) for the numbe of primes less than or equal to x, one can rephrase our upper bound on p_n in terms of a lower bound on \pi(x) (estimating \pi(x) is historically how this problem is usually phrased). We can deduce from our work that \pi(x) \geq \log\log x. The poor quality of our bound is evidenced by the fact that the function x \mapsto \log \log x grows very slowly. For example it says that there are around 0.8 primes less than ten, when there are actually four.

Battling on.

I the next post I hope to derive, using similarly elementary observations, a lower bound for \pi(x) which is a little bit better, though which is still a long way from the truth. I will also discuss the interesting result that the sum of the reiprocals of the prime numbers diverges.