If we are prepared to accept that every number greater than one has a prime divisor, then Euclid’s famous proof of the infinitude of prime numbers is one sentence long: Given any prime number p, if I multiply together all the prime numbers less than or equal to p and then add one, I get a number which leaves a remainder of one when divided by any of the prime numbers less than or equal to p and whence has a prime divisor greater than p. I try to explain why every number greater than one has a prime divisor in this post.

Today I would like to think a little bit more about prime numbers. Specifically, we will spend some time thinking about the number of prime numbers less than a given a number. We will start by seeing if we can get any quantitative information out of Euclid’s proof itself, before moving on to cleverer ways of achieving this.

### Crudity

We will write $p_n$ for the $n^{th}$ prime number. We will write $q_n$ for the number which is constructed in Euclid’s proof, i.e. $q_n$ is one more than the product of the first n prime numbers. Given $p_n$, Euclid’s proof not only shows the existence of a larger prime $p_{n+1}$, but it gives an upper bound on $p_{n+1}$; it gives an estimate as to how large the next prime can possibly be: It says that $p_{n+1} < q_n$, since $p_{n+1}$ is some non-trivial divisor of $q_n$.

Since $q_n$ is one more than the product of the first n primes, each of which is at most $p_n$, it is true that $q_n \leq p_n^n + 1$, and so $p_{n+1} < p_n^n + 1$. This is an estimate on the rate of growth of the function $n \mapsto p_n$. It is actually a very poor estimate. For example, it says that the third prime number, which is five, is less than twenty-eight. It says that the fourth prime number, which is seven, is less than two hundred and fifty-seven. Obviously this estimate is a long way off. We ought to have expected this estimate to be bad, though. Firstly, Euclid’s proof is rather crude: The proof exhibits a prime larger than $p_n$ by finding a number which is not divisible by any of $p_1, p_2,... p_n$, it makes no attempt to find the first number after $p_n$ which is not divisible by any of $p_1, p_2,... p_n$ . Secondly, we threw away a lot of information when we made the bound at the begining of this paragraph: We said, “all of the first n primes are less than or equal to $p_n$“, but if n is very large, this estimate is terrible. It replaces small numbers like two, three and five by a very large number. So, in conclusion, this approach is pretty bad.

### Any Better?

Let’s try again with a slighly different approach. The key observation here is that `Euclid’s bound’ $p_{n+1} < q_n$ lends itself to an inductive argument because $q_n$ depends only on the first n primes. To be more explicit, suppose there is some function $b(n)$, for which the following holds: Firstly, $latex b(1)$ is greater than or equal to two. This would allow us to start the induction. Secondly, if $p_n \leq b(n)$ for all $n \leq N$, then $p_{N + 1} \leq b(N+1)$. This is the inductive step. Euclid’s bound would come into play in the inductive step: Since $p_{N+1} \leq q_N$, in order to show that $p_{N + 1} \leq b(N+1)$, it suffices to show that $q_N \leq b(N+1)$. The result of completing the induction would be that $p_n \leq b(n)$ for all n. Let’s look at the inductive step in more detail to see what kind of functions we can get away with for b: If all we know is that $p_n \leq b(n)$ for all $n \leq N$, then we have that $q_N = \Pi_{j=1}^Np_j + 1 \leq \Pi_{j=1}^Nb(j) + 1$.

The inequality we seek will therefore follow if b satisfies $\Pi_{j=1}^Nb(j) + 1 \leq b(N+1)$.

Unfortunately, even this more careful analysis has not yeilded a very good bound. Functions which satisfy such an inequality must grow pretty fast. Suppose, for example that b grows at least exponentially fast, i.e. that $b(n) = c^{B(n)}$ for some constant c and some other function B, which is increasing with at least linear growth. Then the inequality above reads $c^{B(1) + B(2) + ... + B(N)} + 1 \leq c^{B(N+1)}$.

So, approximately speaking (by ignoring the one and taking logarithms), we must have $B(1) + B(2) + ... + B(N) \leq B(N+1)$, which, if you’ll take my word for it, suggests exponential growth of B (because the sum of the first N $k^{th}$ powers is a polynomial of degree k+1). So the original function b must grow very fast. For example, the function $b(n) = 2^{2^n}$ works.

The advantage of this result is that the bound can be calculated easily. The disadvantage is, well, that it is still a terrible estimate. It says that the second prime number, which is three, is at most sixteen and that the third prime number, which is five, is at most two hundred and fifty-six. Actually, if we write $\pi(x)$ for the numbe of primes less than or equal to x, one can rephrase our upper bound on $p_n$ in terms of a lower bound on $\pi(x)$ (estimating $\pi(x)$ is historically how this problem is usually phrased). We can deduce from our work that $\pi(x) \geq \log\log x$. The poor quality of our bound is evidenced by the fact that the function $x \mapsto \log \log x$ grows very slowly. For example it says that there are around 0.8 primes less than ten, when there are actually four.

### Battling on.

I the next post I hope to derive, using similarly elementary observations, a lower bound for $\pi(x)$ which is a little bit better, though which is still a long way from the truth. I will also discuss the interesting result that the sum of the reiprocals of the prime numbers diverges.