In this post we will be looking at probabilistic coupling, which is putting two random variables (or proceses) on the same probability space. This turns out to be a surprisingly powerful tool in probability and this post will hopefully brainwash you into agreeing with me.

Consider first this baby problem. You and I flip coins and denote our -th coin flip. Suppose that I have probability of landing heads and you have probability of landing heads. Let

be the first times sees three heads in a row. Now obviously you know that , but can you prove it?

Of course here it is possible to compute both of the expectations and show this directly, but this is rather messy and long. Instead this will serve as a baby example of coupling.

So we want to put in the same probability space such that almost surely for each . Let be i.i.d. uniforms on and for each say that is heads if and otherwise tails, and is heads if and otherwise tails.

What is apparent here is that as we use the same source of “randomness”. Also the marginals are given by and hence

.

I really like this example of coupling because everyone learns this without learning coupling. We prove that using the transience of the simple symmetric random walk (SSRW) on , the SSRW on is transient for any . This is done by induction on so we will only look at .

Suppose is a random walk on and let be defined by projecting on to the first three co-ordinates, i.e. . Now this makes a lazy random walk on , to get rid of the laziness we define and

.

It is easy to see that almost surely and so is a random walk on .

Now one could make a contradiction argument here and say if was recurrent then would also be recurrent, but the construction here is hidden in that statement. Explicitly we constructed a random walk on and a random walk on such that

where denotes the Euclidian norm. Hence we have that if as , then so does .

A while ago I wrote a post on this (found here) that covers a probabilistic proof of Liouville’s theorem in . Here we look at but can be extended to manifolds as well (see here for an example).

So we suppose that is a bounded harmonic function. Liouville’s theorem states that is constant.

Let be a Brownian motion on and . It is easy to show that

.

Now we let be arbitrary and be the hyperplane such that reflected about it is . Define

.

Before we use coupling it is worth convincing yourself that . Loosely speaking, this is due to the fact that according the vector normal to , the Brownian motion is one dimensional.

Now we construct a new Brownian motion by reflecting about . This has the same law as . We see that

.

But now

which holds for all , where . Taking and using the fact we see that the right hand side of the equation tends to zero.

]]>

Previously, we used Euclid’s proof of the infinitude of the primes as inspiration for a way of arriving at an upper bound for the number of primes less than a given number. Here, we will come up with a slightly cleverer, more quantitative proof of Euclid’s Theorem and, as before, our estimate will grow out of the proof of Euclid’s Theorem, the improvement in the bound being testament to the fact that the proof is more illuminating.

In Euclid’s proof, one constructs a number which is not divisible by any of the first *n* primes, but in fact it is clear we are done if we prove the existence of a number that *has a prime factor* not among the first *n* primes. To do this we need to show that it cannot be the case that *every* number has *all* of its prime factors among the first *n* primes. And, luckily, it does indeed seem very unlikely that *every* number has *all* of its prime factors among the first *n* primes. So the quantitative question we ask is: “How many numbers less than or equal to *x* have all of their prime factors among the first *n* primes?”. Call this number

Let’s estimate : Let *k* be some number less than or equal to *x*, all of whose prime factors are among the first *n* primes. We can write

,

where *m* is *square-free*, that is it is not divisible by the square of any prime, and where . The question is: “How many such numbers are there?”. We can get away with a crude estimate obtained by counting the two factors entirely independently: Firstly, there are at most choices for simply because it is less than . What about *m*? Well, we know that *m* must be a product of primes (this follows from the fact that every number is divisible by a prime – see the previous post) and that all its prime factors are among the first *n* primes so

,

where, since *m* is not divisible by the square of any prime, each exponent is either equal to one or equal to zero. Counting all possible choices of these exponents tells us there are at most possible values of *m. *This counting work tells us that .

If we *assume* for the sake of contraction that there *are* only *n* primes, then for every *x*, because every number (except one) has a prime divisor and therefore every number (except one) has a prime divisor among the first *n* primes. This would mean that for every , which is clearly nonsense. Therefore, there are infinitely many primes.

As promised, we use the simple idea of this proof to get an estimate on the number of primes less than a given number. Fix and write . This means we are calling the biggest prime less than or equal to the prime. So the first primes are actually *all* the primes less than or equal to *x*. Thus is counting the number of numbers less than or equal to all of whose prime factors are less than or equal to *x*. On the one hand this is obviously equal to , but on the other hand we have the bound obtained in the previous section,* i.e.*: , and so *. *We can deduce from this that

.

While this is a bit of an improvement on , it is still very bad. For example, it says that the number of primes less than one billion is greater than 14.9 when in actual fact it is greater than fifty million.

Another thing that comes out of the work in this post is a very quick and neat proof that the sum of the reciprocals of the prime nunbers diverges, *i.e.* this sum:

.

Our observation is the following one: For any prime number , only every number is divisible by . Therefore the number of numbers less than or equal to which are divisible by a given prime is at most . We use this observation in the context of the quantity . This is the number of numbers less than or equal to which have a prime divisor greater than , *i.e.* which are divisible by one or more of . This is not more than

If the sum converged, then the tail of the sum could be made arbitrarily small and so by choice of *n*, we would have that

Thus , or which is clearly nonsense again. Therefore the sum diverges.

]]>Today I would like to think a little bit more about prime numbers. Specifically, we will spend some time thinking about the number of prime numbers less than a given a number. We will start by seeing if we can get any quantitative information out of Euclid’s proof itself, before moving on to cleverer ways of achieving this.

We will write * *for the prime number. We will write for the number which is constructed in Euclid’s proof, *i.e. * is one more than the product of the first *n* prime numbers. Given , Euclid’s proof not only shows the existence of a larger prime , but it gives an *upper bound* on ; it gives an estimate as to how large the next prime can possibly be: It says that , since is some non-trivial divisor of .

Since is one more than the product of the first *n* primes, each of which is at most , it is true that , and so . This is an estimate on the rate of growth of the function . It is actually a very poor estimate. For example, it says that the third prime number, which is five, is less than twenty-eight. It says that the fourth prime number, which is seven, is less than two hundred and fifty-seven. Obviously this estimate is a long way off. We ought to have expected this estimate to be bad, though. Firstly, Euclid’s proof is rather crude: The proof exhibits a prime larger than by finding *a* number which is not divisible by any of , it *makes no attempt* to find the* first number after* which is not divisible by any of . Secondly, we threw away a lot of information when we made the bound at the begining of this paragraph: We said, “all of the first *n* primes are less than or equal to “, but if *n* is very large, this estimate is terrible. It replaces small numbers like two, three and five by a very large number. So, in conclusion, this approach is pretty bad.

Let’s try again with a slighly different approach. The key observation here is that `Euclid’s bound’ lends itself to an inductive argument because depends only on the first *n* primes. To be more explicit, suppose there is some function , for which the following holds: Firstly, $latex b(1)$ is greater than or equal to two. This would allow us to start the induction. Secondly, *if* for all , *then* . This is the inductive step. Euclid’s bound would come into play in the inductive step: Since , in order to show that , it suffices to show that . The result of completing the induction would be that for all *n*. Let’s look at the inductive step in more detail to see what kind of functions we can get away with for *b*: If all we know is that for all , then we have that

.

The inequality we seek will therefore follow if *b* satisfies

.

Unfortunately, even this more careful analysis has not yeilded a very good bound. Functions which satisfy such an inequality must grow pretty fast. Suppose, for example that *b* grows at least exponentially fast, *i.e.* that for some constant *c* and some other function *B*, which is increasing with at least linear growth. Then the inequality above reads

.

So, approximately speaking (by ignoring the one and taking logarithms), we must have , which, if you’ll take my word for it, suggests exponential growth *of B* (because the sum of the first *N* powers is a polynomial of degree *k+1*). So the original function *b* must grow very fast. For example, the function works.

The advantage of this result is that the bound can be calculated easily. The disadvantage is, well, that it is still a terrible estimate. It says that the second prime number, which is three, is at most sixteen and that the third prime number, which is five, is at most two hundred and fifty-six. Actually, if we write for the numbe of primes less than or equal to *x*, one can rephrase our upper bound on in terms of a lower bound on (estimating is historically how this problem is usually phrased). We can deduce from our work that . The poor quality of our bound is evidenced by the fact that the function grows *very* slowly. For example it says that there are around 0.8 primes less than ten, when there are actually four.

I the next post I hope to derive, using similarly elementary observations, a lower bound for which is a little bit better, though which is still a long way from the truth. I will also discuss the interesting result that the sum of the reiprocals of the prime numbers diverges.

]]>A divisor of a number *n* is another, smaller number, for which there is no remainder when *n* is divided by it. For example, three is a divisor of six, seven is a divisor of seven and thirty-nine is a divisor of one hundred and seventeen. A *prime number* is a number with exactly two divisors. Let’s think about what it means to have exactly two divisors. Firstly, we observe that every whole number has at least one divisor, namely the number itself. Secondly, every whole number except for ‘1’ has at least two divisors: The number itself and the number ‘1’. So, with the exception of the number ‘1’, being prime expresses the quality of having the fewest possible divisors. Why must there must be infinitely many such numbers?

Now, although the definition of a prime number is a restriction on the number of divisors the number has, the most fundamental observation that can be made about primes is the way in which they arise *as divisors of other numbers.* I’m obviously not talking about primes dividing other primes, because this is impossible, but consider now a number which is not prime, but which is greater than ‘1’. So, we are thinking about some number with more than two divisors. Is any of these divisors prime? The answer is always yes. This is the fundamental observation. How do I know that? Well, since this number has more than two divisors, it has a divisor which is different from ‘1’ and the number itself. It has what is called a *non-trivial* divisor: A *smaller* number, but not ‘1’, which divides the original number. For example, three is a non-trivial divisor of fifteen. There are two alternatives: The first is that this smaller number is itself prime (like in our example), in which case we have done what we set out to do. The second is that it is not prime, *e.g. *the divisor four of the number twelve. In this case, it too has more than two divisors. Importantly, anything which divides this smaller number also divides the original number (a divisor of a divisor is a divisor), so in order to demonstrate that one of the divisors of the original number is a prime, it is sufficient to demonstrate that one of the divisors of this smaller number is prime. We have `reduced’ the problem to a problem about a smaller number.

We can repeat our argument exactly: Pick a non-trivial divisor *of the smaller number*: This is a number which is smaller still. Again we have the two alternatives: Either it is prime or it too has a non-trivial divisor. If the first alternative holds, then we are done, but if not then we can repeat the argument again. And again, each time getting a non-prime number which is smaller than the previous one, but which is not equal to one. Aha! But this cannot go on forever! It cannot go on forever because numbers with non-trivial divisors cannot keep getting smaller and smaller. The smallest is 4. So, eventually we are forced to conclude that, at some point, the first alternative holds: We come across a number which is prime. And, by our observation that the divisor of a divisor is a divisor, we have therefore found a prime divisor of the original number.

We have argued that every number greater than ‘1’ and which is not prime, has a prime divisor. Since prime numbers are divisble by themselves, we have in fact proved that every number except for ‘1’ has a prime divisor. This is fantastic news.

The hard part is done. We are ready for Euclid’s proof. The way Euclid demonstrated that there were infinitely many prime numbers was to give a reason as to why it is the case that for every prime number, there must be a larger prime number. Allow me to demonstrate exactly how he did so with an example. Let’s choose a prime number to `start at’, say seven. Is there a prime number larger than 7? We all know that there is (11, for example) but the point is to see the way in which Euclid did it, because his method can be *generalised* so that it works for any prime number: No matter which prime number you start at, there is always a larger prime number. Euclid thought about the number you get if you multiply together all of the prime numbers up to and including seven and then add one. This number is 211 and is certainly larger than seven. The reason why Euclid thought about this number is that it is not divisible by any of the primes up to and including seven. By writing it out in the way in which it was constructed, *i.e.* as* *211 = 2 x 3 x 5 x 7 + 1, one can see immediately that it leaves a remainder of one when divided by two, three, five and seven. However, we know from the previous section that it *must* have a prime divisor. So there must be a prime larger than seven.

And that’s all there is to it. Once this idea is grasped, Euclid’s famous proof is one sentence long: Given any prime number *p*, if I multiply together all the prime numbers less than or equal to *p* and then add one, I get a number which leaves a remainder of one when divided by any of the prime numbers less than or equal to *p *and whence has a prime divisor greater than *p*.

Before we describe the RSA algorithm, there is one important mathematical concept, which is prime numbers and their factorization. Recall that a number is prime if no other number divides other than itself and . For technical reasons we exclude the number from being prime. So lets see some examples. Is prime? Well, yes, no other number other than and divide it. Is prime? No, because .

There is a fundamental theorem in number theory which says that every number can be uniquely written as a product of prime numbers, i.e. where are prime. So again, a few examples cannot hurt. Take the number . We know that , but now is not prime so . Hence . That’s what a prime factorisation is, and what the theorem says is pretty basic, if a number , then .

At this point now I can state what is the fundamental idea behind RSA:

**Factoring a number into prime factors is much harder than checking if a number is prime!**

But why is this true? There are technical reasons for this but I prefer to think along the following lines. Computers are much like humans, so imagine if a human is given the task of factorising numbers and checking if numbers are prime.

Suppose we are given a number and we wish to check if this is prime. A naive approach would be to check if every number smaller than divides . So if I gave you 7, then you say 2 doesn’t divide 7, 3 doesn’t divide 7 and so on. Try this out on a few numbers yourself on a piece of paper and you will quickly notice this, you don’t need to check every number up to . Why? Because if for two numbers, then one of or is less than . Look at the pair of factors of 24:

Now we know that and so is in between 4 and 5. Notice that in each pair above, we always have one number below 5.

So to check if a number is prime we can look at all the numbers below and see if they divide . Going further we don’t need to check the even numbers other than 2, because if an even number divides , then 2 divides . This is a huge reduction! Later we will look at making this even better.

Here is a sample code in C.

int isPrime(int n){ int sq = sqrt(n) + 1; int i; if(n % 2 == 0) return 0; for(i = 3; i < sq; i += 2){ if(n % i == 0) return 0; } return 1; }

Now we see that the naive approach here is really all there is. Not only do we have to find a number that divides , but we also have to check if it is prime! A simple algorithm goes like this. Go from 1 to and check if any of those numbers divide . If they don’t, then is prime so we stop. Otherwise if then repeat the same for and .

Here is a sample code in C.

int* factorise(int n, int* factors){ if( isPrime(n) ){ *factors = n; factors++; return factors; } if(n % 2 == 0){ *factors = 2; factors++; if(n != 2) factors = factorise(n/2, factors); } int sq = sqrt(n) +1; int i; for(i = 3; i < sq; i += 2){ if(n % i == 0){ factors = factorise(i, factors); factors = factorise(n/i, factors); break; } } return factors; }

To see how much of a difference there is practically I ran some tests. Obviously, neither of the two methods are optimal, nor are the implementations but this does give you a rough idea of the scale of things. Here is a result when using 70,000 random numbers and running the test 100 times:

Factorising: 2.446770

Primality : 0.225442

Diff : 2.221328

This is quite a difference in terms of computing time. This is only for numbers up to 12,000 and usually people use much bigger primes in cryptography, for which the difference is even larger. Here is the full source code.

#include <stdio.h> #include <string.h> #include <stdlib.h> #include <time.h> #include <math.h> #define MAX_NUM 12000 #define MAX_LOOP 100 int isPrime(int n){ int sq = sqrt(n) + 1; int i; if(n % 2 == 0) return 0; for(i = 3; i < sq; i += 2){ if(n % i == 0) return 0; } return 1; } int* factorise(int n, int* factors){ if( isPrime(n) ){ *factors = n; factors++; return factors; } if(n % 2 == 0){ *factors = 2; factors++; if(n != 2) factors = factorise(n/2, factors); } int sq = sqrt(n) +1; int i; for(i = 3; i < sq; i += 2){ if(n % i == 0){ factors = factorise(i, factors); factors = factorise(n/i, factors); break; } } return factors; } int main(int argc, char* argv[]){ if(argc < 3){ printf("Usage:\n%s isprime [number]\n%s factorise [number]\n%s time [number of trials]", argv[0], argv[0], argv[0]); return 0; } int n = atoi(argv[2]); if(n < 2){ printf("%i is not a number I deal with\n",n); return 0; } if(!strcmp(argv[1], "isprime")){ printf("%i is %sprime.\n", n, ( isPrime(n) == 1? "":"not ") ); return 0; } else if(!strcmp(argv[1], "factorise")){ int factors[n]; //We can't have more than n factors! memset(factors, 0, sizeof(factors)); factorise(n, factors); int i; printf("%i has factors:\n", n); for(i = 0; i < (int)sqrt(n)+1; i++){ if(factors[i] != 0) printf("%i ", factors[i]); } printf("\n"); return 0; } else if(!strcmp(argv[1], "time")){ srand(time(0)); int tests[n]; int i, j; for(i = 0; i < n; i++) tests[i] = rand() % (MAX_NUM-2) + 2; struct timeval cbefore, cafter, fbefore, fafter; float check, fact; gettimeofday(&cbefore, NULL); for(j = 0; j < MAX_LOOP; j++){ for(i = 0; i < n; i++) isPrime(tests[i]); } gettimeofday(&cafter, NULL); check = (float) (cafter.tv_sec - cbefore.tv_sec); check += ( (float) (cafter.tv_usec - cbefore.tv_usec) )/1000000; int factors[ MAX_NUM ]; gettimeofday(&fbefore, NULL); for(j = 0; j < MAX_LOOP; j++){ for(i = 0; i < n; i++) factorise(tests[i], factors); } gettimeofday(&fafter, NULL); fact = (float) (fafter.tv_sec - fbefore.tv_sec); fact += ( (float) (fafter.tv_usec - fbefore.tv_usec) )/1000000; printf("Factorising: %f\n", fact); printf("Primality : %f\n", check); printf("Diff : %f\n", fact - check); return 0; } printf("Usage:\n%s isprime [number]\n%s factorise [number]\n%s time [number of trials]", argv[0], argv[0], argv[0]); return 0; }

Now that I have brainwashed you into accepting that prime factorisation is harder than checking for primality, lets look at how we can use this. Instead of letters, we can think of a piece of information as numbers (computers work like this anyway). If you really insist, then I can say suppose a=0, b=1, and so on.

Before I can describe how RSA works, we need to look into modular arithmetic. The concept is actually very simple. Suppose we have 6 rocks in a lake arranged in a circle and we number the rocks in a clockwise direction 0, 1, 2, 3, 4, 5. Now stand at the stone 0. Suppose that you are only allowed to move by one rock, so that going from 2 to 3 or 3 to 2 is allowed, but not from 1 to 4. For any number I give you, move in clockwise direction by that amount and tell me the number you land on. Suppose I give you 8. You will end up doing this:

0 -> 1 -> 2 -> 3 -> 4 -> 5 -> 0 -> 1 -> 2

We then write and say 8 is 2 modulo 6. Notice that if I have any multiple of 6, say 12, then I will always end up at 0, because going 6 steps leaves me where I am standing (try it!). An alternative way of thinking about this, which is much easier in computation is to look at the remainder. 8 divded by 6 is 1 with a remainder of 2, so . Similarly 25 divided by 3 is 8 with remainder 1, so .

So how do we use this to encrypt information? Well first, lets assume that everything to be encrypted is given by a number (much like a=0, b=1 and so on). So I have a number that I want to send to you encrypted. Well what I can do is the following. I pick two numbers and and then compute . This scrambles the information pretty well, but we need to be smart in the choices of and to make sure the receiver can decrypt this.

In RSA there two keys called public key and private key. The idea is that I send you my public key which consists of two numbers and , and you take your number you want to send to me and send me . Then I have what is called a private key which is along with , and I do to obtain the original message. But now we need to look at how to make this work effectively. The idea here is that even if an attacker obtains the public key, he can only send me an encrypted message and not read your encrypted message.

First let me describe the RSA algorithm:

- Find two prime numbers and and let . (In practicality there are better choices for and that make harder to factorise).
- Let (called the Euler’s totient) and pick smaller than such that the only number that divides and is 1.
- Compute the number such that . This is done using Euclid’s algorithm, which I won’t dwell in to.
- Give and as your public key and keep a secret.

Before we show that this works, lets do a simple example. Lets pick and so that and . Now we can pick to be 3, 5 or 7 so lets say . Now we try to compute such that . It is easy to see that because , so we have a pretty unsafe encryption.

Now you want to send me the number so what you do instead is send me . I take the number 8 and apply the decryption which is and obtain your original number (hint, typing 8^3 mod 15 into google helps).

So lets show that this algorithm works in general. To do so we need a few theorems from number theory. First one is **Fermat’s little theorem** (not Fermat’s last theorem!) which states that if is a prime number, then for any number ,

.

Secondly we need the **Chinese remainder theorem**, a simple version of it states that if and are primes, and if and are such that

then

.

Firstly, notice that (try to imagine what this says about the stone stepping!) so that

.

Now we know that and recalling that this says divided by has remainder 1, we can obtain that for some number . Lets plug this in to the above and use Fermat’s little theorem:

but now

.

Applying Fermat’s little theorem we see that , thus to sum it up:

.

Replacing above by above leads to the same result (now you try this one) so that

.

Now what did the Chinese remainder theorem tell us? It says that

so we decrypt the message using this method!

So now the question you have (or should have) is why did I talk about prime factorisation at the start? Well remember that we can check if numbers are prime much quicker than we can factor numbers in to primes, so finding large prime numbers is easy (more on this later). So lets assume that you are sending me your credit card numbers and a malicious attacker called Levy is trying to obtain this information. We can assume he has the public key which is and . We used Euclid’s algorithm to generate the private key but what is to stop Levy from computing such that (after all this is what we were after)?

The problem with this is that to use Euclid’s algorithm, and need to be **coprime**, that is, the only number that divides both is 1. When we generate the keys, we picked coprime to , so we could use Euclid’s algorithm, but the attacker does not know a priori.

There are two options for him, which are equally bad. First, he could try by trial and error to find by computing , , etc. which is very slow. Alternatively he could try to find and by factorising , which we have seen is again very slow. In real life application the prime numbers used are very long and it is not that Levy cannot break the code, he can, but that the time that he needs the break the code is so long that by that time your credit card would have expired.

]]>

Most people would answer this question with an affirmative. Let us call this the switch scenario. The second scenario is that a trolley is again running down a hill at fast speed, aimed at five people at the bottom which it will surely kill. However this time you are standing on a bridge with a fat man next to you. If you push the fat man off the bridge the trolley will stop but kill that fat man. Would you do it?

The common answer here is no. This is somewhat strange as the consequences of the actions are the same. Moreover there is no easy way to justify why in the lonesome man in the switch scenario is any less innocent or involved than the fat man. Nor is there any increase or decrease in your involvement. In both situations the person was at the wrong place at the wrong time and in both situations you are actively deciding whom to kill.

There is however, as I believe, a problem with the scenarios that distinguish them. First, let me begin by saying that morality is in our nature, in the sense that it is somehow an evolutionary trait that we inherit. Whether or not this is in our DNA or a social trait is somewhat irrelevant to the discussion but if we did not posses some common moral code then our species would be extinct. Thus as such, I find it hard to believe that rationality has a large part to play.

There is also the problem of the moral code. Most people mistakenly think that there are absolute -in the sense of personal absolute, so that a person will say any act contradicting these is immoral for them- axioms of morals. For example one common one that a person may hold is that “Thou shalt not kill”. The problem with this is twofold. First, clearly there are cases when most people would consider it a moral act to kill, for example, a soldier killing the guards at a concentration camp to free the prisoners. Secondly, when there are more than one of these axioms, they tend to contradict each other. Take for a second axiom that “One should reduce suffering” which will contradict each other in cases of terminally ill suffering patients. Thus morality by it’s very nature considers the situation at hand.

The problem with the scenarios is then the following. The morality we receive is granted by, more or less, intuition. The first scenario is one that is imaginable, for example one may think of a pilot trying to decide where to crash to plane as to save as many people on the ground as possible. The second however is not. There are an array of other possible alternatives and an array of uncertainty surrounding this. First, unlike the first scenario where one can imagine that the switch would change the track of the trolley, there is no guarantee that the fat man will stop the trolley in real life. Secondly, in the second scenario the question to be asked is why the fat man and not us? Is there any guarantee that if I jumped in front of the trolley it wouldn’t have stopped?

My point here isn’t that the scenarios are not posed properly but that the second scenario is unrealistic and that as our morality is governed by the scenarios we can experience (and those we have) our answer to the second question seems to contradict the values described by the answer to the first. We are unconsciously trying to relate the situation with a more realistic one we may have encountered, and as such the questions that are raised, though they are ruled out by the scenario, still affect our judgment on the morality of the action.

]]>In that case we say that the distribution is representable by the function . Given any distribution we can define its *distributional derivative * to be the distribution defined as

.

In the special case where the distribution can be represented by a function in the way we show above the distributional derivative will be

For notational convenience when the distribution is representable by the function we write . Suppose now that not only the distribution is representable by a function but its distributional derivative is also representable by a function, say . Then the relationship between and will be

In that case we say that and we can denote . However there is another more general way to interpret the fact that a function represents a distribution. Suppose that is (signed) Radon measure on i.e. a set real function defined on the Borel sets which is additive. Note that we defined that measure to take only real values and not the value . In particular a (signed) Radon measure is always a finite measure or us. Let us define now the distribution as follows:

One can check that this is a distribution indeed. We say that represents the distribution . Furthermore suppose now that the Radon measure is absolutely continuous with respect to the Lebesgue measure on . From the Radon-Nikodym theorem there exists a function such that

and

This means that instead of saying that a distribution is representable by a function one could say that can represented by a Radon measure which is absolutely continuous with respect to The Lebesgue measure with corresponding density . Thus:

**The space consists exactly of all the functions in whose distributional derivative can be represented by a Radon measure which is absolutely continuous with respect to Lebesgue measure.**

However you can have functions whose distributional derivative cannot represented by such a measure but it can be represented by a general Radon measure. This leads to the following definition:

**Definition:(BV(0,1))**

Let . We say that is of bounded variation if its distributional derivative can be represented by a Radon measure . We denote this measure by . This means that

We denote this space with . We also define the total variation of to be the total variation |Df|(0,1)of the measure i.e.

where the supremum is taken over all the Borel disjoint partitions of

Let us note here that a signed Radon measure has always finite total variation.

REMARK: By a smoothing argument one can show that the above relation is true for any .

But wait a minute? What have we learned at school? Wasn’t a bounded variation function, a function that does nto oscillate too much? Didn’t we use some partitions of , define the jumps of on these partitions and then taking the supremum of the total size of the jump? Isn’t the “correct” definition something like that:

**Definition: (Pointwise variation)**

Let . We define the *pointwise variation *of in , to be the following supremum

Let us write a few remarks about the above definition. It is very easy to check that every function with finite pointwise variation is bounded. Also any bounded monotone function defined on has pointwise variation equal to (left limit to 1 and right limit to 0 respectively.) We can also check that any function with can be represented as a difference of two bounded increasing functions and such that . Indeed:

Set

and

.

.

In order to see that is increasing as well note that if we have

which means that and hence . Similarly is increasing and we have of course . It remains to show that . The key point here is to observe that

and .

Suppose that this not true. Then there exists a constant and a sequence and tends to such that for every . This means that for every there exist such that

Without less of generality, passing to a subsequence of if necessary, we can assume that for every . Now we have the following sequence decreasing to

But and imply that which is a contradiction.

This is immediate as is increasing for every . Thus . For any partition there exists a (any ) such that

.

Thus .

We are ready to show that . Since both are increasing we have , . Thus

This is enough for today! In the next post we are going to examine the exact relation of functions of bounded variation and functions that have pointwise variation. We are also going to discuss about the Cantor-Vitali function!

**PS: In the last proof, we used a fact that that is true but we did not prove it. can you see what this fact is???**

With this in mind, define the operator by

.

With some conditions, we can define the same operator on , the partitions of . So for example if and , then . The partition tells us in this case to merge the first and third block and leave the second block alone.

This turns out to be the right tool for describing ancestral lineages as family should only merge together. Let be the restriction of to in the obvious way, i.e. if , then and let . Then we have the following nice proposition which follows from the observation that .

**Proposition:**

*The space with the binary operation defined on it is a monoid with the identity .*

With this in mind, a natural question to ask is if the operator preserves exchangeability. Recall that a random partition is exchangeable if for each permutation of that fixes all but finitely many points we have that has the same distribution as , defined by if and only if .

**Proposition:**

*Let be two independent exchangeable random partitions, then is exchangeable.*

**Proof:**

Let be a permutation and . Notice that if and only if which happens whenever or , where . Consider the cases one by one; firstly we have that the event has the same probability as the event by exchangeability of . Second, if , then where . Let be the random permutation such that , which will be independent of , then we have that if and only if , which has the same probability as by independence and exchangeability.

Thus putting this all together we have that the event has the same probability as , which proves exchangeability.

Let us now endow this space with a metric in preparation to introduce some Feller processes:

.

The choice of this metric is due to the paintbox correspondence of exchangeable partitions (as described here). It is then an easy task to verify that is Lipschitz.

We arrive now at a natural definition of homogeneous exchangeable coalescent. This will be nothing but a Levy process on the monoid we have.

**Definition:**

A process is called an **exchangeable coalescent** if for each the distribution of , conditioned on , is given by where only depends on and is exchangeable.

The fact that these processes carry the Feller property is immediate from the properties of , and thus we obtain the strong Markov property.

A famous example of such a process is the so called **Kingman’s coalescent** in which a pair of blocks merge at rate 1. Without going in to so much detail about the construction of the Kingman’s coalecent, one interesting aspect of this process is that it **comes down from infinity**, that is to say the number of blocks are finite almost surely for all . To see why this is true we just need to look at ,

where are i.i.d. standard exponential random variables. But now as the sum almost surely, this implies that the tail is vanishing and hence we must have that as .

Before we look more into this phenomena, let us first categorize the coalescents via their generators. Without loss of generality we will henceforth assume that and for , where , define

.

Denote by where . There is a natural way to associate these jump rates with the use of a certain measure which satisfies the following

- for all and each
- consequently and .
- is invariant under the action of permutations

Indeed the converse is also true, that is, if we have a measure defined on such that 2. and 3. above hold, then there exists an exchangeable coalescent such that the jump rates are given by . The construction of such a process is very similar to the case of constructing Levy processes from the jump measure by using a Poisson point process. Though interesting, we leave this aside and take the statement on face value.

As a concrete example, consider the Kingman’s coalescent described above. Recall that in the Kingman’s coalescent, any two blocks merge at rate 1, so that if we let be the partition which is all singletons except contains the set , then the rate of the Kingman’s coalescent is given by .

For the reader who is not comfortable with jump measures of Markov processes and/or is confused about the discourse above, not to worry. You do not need to understand all of that abstract non-sense, what describes is the rate of transfer in an infinitesimal amount of time.

There are nice decomposition results for jump measures of coalescent processes given in the Bertoin book (see below). We will be looking at the phenomena of coming down from infinity, that is, the process having finitely many blocks a.s. for any non-zero time. Of course there is the trivial case that we should disregard, which is the case when . This is the case when we have positive probability of going from the state to . In this case it is easy to deduce that eventually we will have finitely many blocks (in fact, one block).

With that aside and a little adaptation of the Schweinberg paper, I present to you a pretty cute result.

**Theorem:**

*Suppose that , then either a.s. for all or a.s. for all .*

Before we prove the theorem, let us see a lemma that will aid us with the main ideas of the proof. A warning I should give here is that the process need not be Markovian, let alone strong Markovian. As far as I am aware, this question still remains open.

We would like to first show that regardless of where you start your process, the time you come down from infinity is the same if your starting point has infinitely many blocks. For this notice that is the same as started from .

**Lemma:**

*For let , then for all with we have .*

**Proof:**

Notice that from the definition if and only if (just check what happens when one of them is finite). With this observation we see that in the case when , if and only if .

**Proof of Theorem:**

Let and . Suppose that , then by the above and the Markov property we have that for all and . So now by he recurrence relation we have that , but is arbitrary, so it must be that .

Next suppose that , we will prove that . There are some cases to consider which we list and do in order.

*Case 1:*

This follows from the strong Markov property and the lemma we have proved above. If and , then which is an obvious contradiction.

*Case 2: *

This follows from the fact that we have declared . In particular this implies that we cannot merge all but finitely many blocks together in one merger time, that is, if , then .

*Case 3: *

Suppose that and let . Then define recursively and where . In plain English, is the first time that the smallest integer not in any of the blocks of containing , joins a block containing some .

From this description we can see that and so it is enough now to show that for a contradiction. Notice that if denotes the total number of mergers involving some blocks, then the condition directly implies that . But now we have an upper bound on where are i.i.d. exponential random variables with parameter . Now the claim that directly follows from

.

There are interesting results on the so called -coalescents which is have given a list of below. A -coalescent is a subset of the class of coalescents that I was talking about, but has the restriction that blocks can only merge into a single block.

Further Reading:

- Jean Bertoin, Random Fragmentation and Coagulation Processes
- Nathanael Berestycki, Recent Progress in Coalescent Theory
- Jason Schweinsberg, Coalescents with Simultaneous Multiple Collisions
- Jim Pitman, Coalescents with Multiple Collisions
- Jason Schweinsberg, A Necessary and Sufficient Condition for the -coalescent to Come Down from Infinity

This is a counting problem. You are being asked to count how many of something there are. The answer will be a formula involving the unknown quantities *n* and *k*. Counting is often thought of as being easy. While it is true that the things which must be counted are easy to describe (I have just done so in a couple of sentences), this does not mean that the problem of counting them is easy. So, in this post I continue my quest to explain some basic things about counting and how to do it.

Mathematics is is hard, that’s why we start with modest goals: Imagine that you are the proud owner of four different ornamental candles. You wish to put two of these on the mantelpiece above the fireplace, as one does. One candle will go on the left-hand side and one on the right. In how many ways can you decorate the mantelpiece thus, using two of your four candles?

The answer is twelve. That seems like quite a lot of different ways, if you ask me. How can one calculate this number? Well, in the other two posts of the series, I have gone on and on about something which I have dubbed as our ‘fundamental observation’ when it comes to these kinds of counting problems. What is this fundamental observation? And what does it mean in the context of this new problem?

This latest example is somewhat similar to the cinema-seats example discussed in the previous post: I have four things (people, candles,…) and I want to arrange some of them in some order. Originally, we were concerned with arranging all of them – *i.e. *with seating all four friends. This time however, not all of the objects will be used. To illustrate my point, note that the analogous question for friends and seats would be: In how many different ways can two seats be filled by a group a four people?

Once again we can turn to our fundamental rule. Talking in the language of candles now, I ask myself: In how many ways can I place a candle on the left-hand side? Well, I have four candles and one of them needs to be put there, so this is obviously four. I am now left with three candles, from which I must choose one for the right-hand side. One has to now see that *for each and every* choice of candle for the left-hand side there are three choices of candle for the right-hand side. The *‘each and every’* part of the previous sentence means that the *number *of ways (three) in which I can ornament the right-hand side *does not depend* on the candle which I chose to go on the left-hand side. This is our fundamental observation; the answer is indeed 4 x 3 = 12.

Suppose instead now that I have five different candles from which to choose. What happens then? Well, now I have five choices for the left-hand side, after which I have four for the right-hand side. There are 5 x 4 = 20 arrangements. I could easily swap ‘left’ here with ‘right’ and it would not make any difference to the validity of the argument. If I had six candles there would be 30 arrangements and so forth.

All we did in the previous paragraph was spot the pattern that was ’emerging’ and carry it on a bit. We have enough information to make a general conjecture: There are *n* x *(n-1) *ways of placing two objects in two empty spaces when you have *n *objects from which to choose. After all, our method of argument or ‘proof’ would be exactly the same as with four candles. There are *n* choices for the first space and for each and every such choice, there are *n-1* choices remaining for the second space.

We are doing well, but much more follows naturally: Why stop at two spaces to be filled? Doesn’t our argument apply even more generally? Let’s see: Suppose I had three spaces to fill with 5 objects. I could choose from *5* for the first space, from 4 for the second and from 3 for the third… For each and every way in which the first two spaces could be filled, there are three ways in which the last place can be filled… This seems to be working.

We now have enough information to make an even more general observation, completely dealing with counting problems of this form. Allow me to quickly introduce a bit of terminology: Suppose I am filling *k* spaces using *n* objects (like in the introduction), or, equivalently, *arranging k* objects, where each object is taken from a set of *n* objects. Then, each possible way in which I can do this is called a *permutation *and the quantity I am interested in counting is called the *number of permutations of k objects from n*. To arrive at this number I proceed along the lines we have discussed: There are *n* choices for the first space to be filled… *(n-1) *choices for the second space… *n-2 *choices for the third… … … and *(n-k+1) *for the *k. *The only potentially tricky bit is to work out that the *, *but one can only spell out so much, so I leave that to you.

Let’s record what we know: The number of permutations of *k* objects from* n* is *n* x *(n-1)* x *(n-2)* x … x *(n-k+1). *If you are comfortable with the standard notation introcued in the previous post, i.e. that* n*! = *n* x *(n-1)* x* (n-2)* x … x 3 x 2, then you will be able to deduce that the number of permutations of *k* objects from *n* is* is n!/(n-k)!
*