Ever wonder what happens when you send your card details over the internet? How exactly does public-key encryption work? I will write a brief, non-technical introduction to these concepts and the mathematical background in them, containing some sample code and plenty of examples.

Before we describe the RSA algorithm, there is one important mathematical concept, which is prime numbers and their factorization. Recall that a number is prime if no other number divides other than itself and . For technical reasons we exclude the number from being prime. So lets see some examples. Is prime? Well, yes, no other number other than and divide it. Is prime? No, because .

There is a fundamental theorem in number theory which says that every number can be uniquely written as a product of prime numbers, i.e. where are prime. So again, a few examples cannot hurt. Take the number . We know that , but now is not prime so . Hence . That’s what a prime factorisation is, and what the theorem says is pretty basic, if a number , then .

At this point now I can state what is the fundamental idea behind RSA:

**Factoring a number into prime factors is much harder than checking if a number is prime!**

But why is this true? There are technical reasons for this but I prefer to think along the following lines. Computers are much like humans, so imagine if a human is given the task of factorising numbers and checking if numbers are prime.

### How to check if a number is prime

Suppose we are given a number and we wish to check if this is prime. A naive approach would be to check if every number smaller than divides . So if I gave you 7, then you say 2 doesn’t divide 7, 3 doesn’t divide 7 and so on. Try this out on a few numbers yourself on a piece of paper and you will quickly notice this, you don’t need to check every number up to . Why? Because if for two numbers, then one of or is less than . Look at the pair of factors of 24:

Now we know that and so is in between 4 and 5. Notice that in each pair above, we always have one number below 5.

So to check if a number is prime we can look at all the numbers below and see if they divide . Going further we don’t need to check the even numbers other than 2, because if an even number divides , then 2 divides . This is a huge reduction! Later we will look at making this even better.

Here is a sample code in C.

int isPrime(int n){ int sq = sqrt(n) + 1; int i; if(n % 2 == 0) return 0; for(i = 3; i < sq; i += 2){ if(n % i == 0) return 0; } return 1; }

### How to factor a number into its prime factors

Now we see that the naive approach here is really all there is. Not only do we have to find a number that divides , but we also have to check if it is prime! A simple algorithm goes like this. Go from 1 to and check if any of those numbers divide . If they don’t, then is prime so we stop. Otherwise if then repeat the same for and .

Here is a sample code in C.

int* factorise(int n, int* factors){ if( isPrime(n) ){ *factors = n; factors++; return factors; } if(n % 2 == 0){ *factors = 2; factors++; if(n != 2) factors = factorise(n/2, factors); } int sq = sqrt(n) +1; int i; for(i = 3; i < sq; i += 2){ if(n % i == 0){ factors = factorise(i, factors); factors = factorise(n/i, factors); break; } } return factors; }

### Practical results

To see how much of a difference there is practically I ran some tests. Obviously, neither of the two methods are optimal, nor are the implementations but this does give you a rough idea of the scale of things. Here is a result when using 70,000 random numbers and running the test 100 times:

Factorising: 2.446770

Primality : 0.225442

Diff : 2.221328

This is quite a difference in terms of computing time. This is only for numbers up to 12,000 and usually people use much bigger primes in cryptography, for which the difference is even larger. Here is the full source code.

#include <stdio.h> #include <string.h> #include <stdlib.h> #include <time.h> #include <math.h> #define MAX_NUM 12000 #define MAX_LOOP 100 int isPrime(int n){ int sq = sqrt(n) + 1; int i; if(n % 2 == 0) return 0; for(i = 3; i < sq; i += 2){ if(n % i == 0) return 0; } return 1; } int* factorise(int n, int* factors){ if( isPrime(n) ){ *factors = n; factors++; return factors; } if(n % 2 == 0){ *factors = 2; factors++; if(n != 2) factors = factorise(n/2, factors); } int sq = sqrt(n) +1; int i; for(i = 3; i < sq; i += 2){ if(n % i == 0){ factors = factorise(i, factors); factors = factorise(n/i, factors); break; } } return factors; } int main(int argc, char* argv[]){ if(argc < 3){ printf("Usage:\n%s isprime [number]\n%s factorise [number]\n%s time [number of trials]", argv[0], argv[0], argv[0]); return 0; } int n = atoi(argv[2]); if(n < 2){ printf("%i is not a number I deal with\n",n); return 0; } if(!strcmp(argv[1], "isprime")){ printf("%i is %sprime.\n", n, ( isPrime(n) == 1? "":"not ") ); return 0; } else if(!strcmp(argv[1], "factorise")){ int factors[n]; //We can't have more than n factors! memset(factors, 0, sizeof(factors)); factorise(n, factors); int i; printf("%i has factors:\n", n); for(i = 0; i < (int)sqrt(n)+1; i++){ if(factors[i] != 0) printf("%i ", factors[i]); } printf("\n"); return 0; } else if(!strcmp(argv[1], "time")){ srand(time(0)); int tests[n]; int i, j; for(i = 0; i < n; i++) tests[i] = rand() % (MAX_NUM-2) + 2; struct timeval cbefore, cafter, fbefore, fafter; float check, fact; gettimeofday(&cbefore, NULL); for(j = 0; j < MAX_LOOP; j++){ for(i = 0; i < n; i++) isPrime(tests[i]); } gettimeofday(&cafter, NULL); check = (float) (cafter.tv_sec - cbefore.tv_sec); check += ( (float) (cafter.tv_usec - cbefore.tv_usec) )/1000000; int factors[ MAX_NUM ]; gettimeofday(&fbefore, NULL); for(j = 0; j < MAX_LOOP; j++){ for(i = 0; i < n; i++) factorise(tests[i], factors); } gettimeofday(&fafter, NULL); fact = (float) (fafter.tv_sec - fbefore.tv_sec); fact += ( (float) (fafter.tv_usec - fbefore.tv_usec) )/1000000; printf("Factorising: %f\n", fact); printf("Primality : %f\n", check); printf("Diff : %f\n", fact - check); return 0; } printf("Usage:\n%s isprime [number]\n%s factorise [number]\n%s time [number of trials]", argv[0], argv[0], argv[0]); return 0; }

### RSA Encryption

Now that I have brainwashed you into accepting that prime factorisation is harder than checking for primality, lets look at how we can use this. Instead of letters, we can think of a piece of information as numbers (computers work like this anyway). If you really insist, then I can say suppose a=0, b=1, and so on.

Before I can describe how RSA works, we need to look into modular arithmetic. The concept is actually very simple. Suppose we have 6 rocks in a lake arranged in a circle and we number the rocks in a clockwise direction 0, 1, 2, 3, 4, 5. Now stand at the stone 0. Suppose that you are only allowed to move by one rock, so that going from 2 to 3 or 3 to 2 is allowed, but not from 1 to 4. For any number I give you, move in clockwise direction by that amount and tell me the number you land on. Suppose I give you 8. You will end up doing this:

0 -> 1 -> 2 -> 3 -> 4 -> 5 -> 0 -> 1 -> 2

We then write and say 8 is 2 modulo 6. Notice that if I have any multiple of 6, say 12, then I will always end up at 0, because going 6 steps leaves me where I am standing (try it!). An alternative way of thinking about this, which is much easier in computation is to look at the remainder. 8 divded by 6 is 1 with a remainder of 2, so . Similarly 25 divided by 3 is 8 with remainder 1, so .

So how do we use this to encrypt information? Well first, lets assume that everything to be encrypted is given by a number (much like a=0, b=1 and so on). So I have a number that I want to send to you encrypted. Well what I can do is the following. I pick two numbers and and then compute . This scrambles the information pretty well, but we need to be smart in the choices of and to make sure the receiver can decrypt this.

In RSA there two keys called public key and private key. The idea is that I send you my public key which consists of two numbers and , and you take your number you want to send to me and send me . Then I have what is called a private key which is along with , and I do to obtain the original message. But now we need to look at how to make this work effectively. The idea here is that even if an attacker obtains the public key, he can only send me an encrypted message and not read your encrypted message.

First let me describe the RSA algorithm:

- Find two prime numbers and and let . (In practicality there are better choices for and that make harder to factorise).
- Let (called the Euler’s totient) and pick smaller than such that the only number that divides and is 1.
- Compute the number such that . This is done using Euclid’s algorithm, which I won’t dwell in to.
- Give and as your public key and keep a secret.

Before we show that this works, lets do a simple example. Lets pick and so that and . Now we can pick to be 3, 5 or 7 so lets say . Now we try to compute such that . It is easy to see that because , so we have a pretty unsafe encryption.

Now you want to send me the number so what you do instead is send me . I take the number 8 and apply the decryption which is and obtain your original number (hint, typing 8^3 mod 15 into google helps).

So lets show that this algorithm works in general. To do so we need a few theorems from number theory. First one is **Fermat’s little theorem** (not Fermat’s last theorem!) which states that if is a prime number, then for any number ,

.

Secondly we need the **Chinese remainder theorem**, a simple version of it states that if and are primes, and if and are such that

then

.

Firstly, notice that (try to imagine what this says about the stone stepping!) so that

.

Now we know that and recalling that this says divided by has remainder 1, we can obtain that for some number . Lets plug this in to the above and use Fermat’s little theorem:

but now

.

Applying Fermat’s little theorem we see that , thus to sum it up:

.

Replacing above by above leads to the same result (now you try this one) so that

.

Now what did the Chinese remainder theorem tell us? It says that

so we decrypt the message using this method!

### Why is the RSA safe?

So now the question you have (or should have) is why did I talk about prime factorisation at the start? Well remember that we can check if numbers are prime much quicker than we can factor numbers in to primes, so finding large prime numbers is easy (more on this later). So lets assume that you are sending me your credit card numbers and a malicious attacker called Levy is trying to obtain this information. We can assume he has the public key which is and . We used Euclid’s algorithm to generate the private key but what is to stop Levy from computing such that (after all this is what we were after)?

The problem with this is that to use Euclid’s algorithm, and need to be **coprime**, that is, the only number that divides both is 1. When we generate the keys, we picked coprime to , so we could use Euclid’s algorithm, but the attacker does not know a priori.

There are two options for him, which are equally bad. First, he could try by trial and error to find by computing , , etc. which is very slow. Alternatively he could try to find and by factorising , which we have seen is again very slow. In real life application the prime numbers used are very long and it is not that Levy cannot break the code, he can, but that the time that he needs the break the code is so long that by that time your credit card would have expired.

## 3 comments

Comments feed for this article

05/09/2011 at 00:44

StephenVery good explanation of RSA! I like modular arithmetic too, and I made a post myself about the RSA encryption algorithm on my blog. I didn’t use all the names of the theorems (other than the totient theorem) in my explanation (which I posted at http://stephen265.wordpress.com/2011/05/10/rsa/), and I see that your explanation is vastly superior to mine. However, I did notice a couple of issues. You used k as the encryption exponent, which is fine, except that you also said that “d × e = 1 mod φ(n)” in the numbered list after “First let me describe the RSA algorithm” where it should be d × k to be consistent. Also, about your concluding arguments: d × k = 1 mod φ(n), not mod n, and the definition for coprime is wrong. Two numbers are coprime if the only number that divides both is 1, not if the only number that divides both is a prime. 12 and 15 are not coprime.

I don’t want to be a hypocrite in correcting the few mistakes I noticed, so feel free to notify me of any suggestions or corrections for my modular arithmetic posts. I typed most of them late at night between homework assignments, so there may be many places that have typos or are unclear. Self-proofreading is nearly impossible because we all have our own set of assumptions and expectations about what we have written based on what we meant to say, so we often overlook our own errors while finding it easier to spot those of others.

05/09/2011 at 00:49

StephenNOTE: The dates on my blog are in month/day/year format, if that makes things any clearer for you, and I’m on Eastern Daylight Time in the United States (currently UTC-4 during Daylight Saving Time).

05/09/2011 at 13:17

BatiThanks for your comment. I fixed the errors you mentioned. I doubt it is superior, I just extracted the bits I find interesting and tried to dumb them down.