In these posts I will not be explaining all of the basic mathematical concepts which are required to appreciate the main discussion, so they are not aimed at the novice, as it were. I would guess maybe that a second-year university student in mathematics will be able to fully appreciate these posts.

Recently, I have been thinking about differentiation and in this post I would like to discuss some ways of approaching the concept, starting right from the basics. In this first post, we’ll discuss some approaches to the idea of differentiation. I want to settle on a certain geometric perspective and then generalise it to higher dimensions. The aim is to shed light on why the definition of the derivative of a function generalises as it does.


Analytic thinking has always been informed by modelling and applied mathematics. One of the differences between the two, as fields of mathematics, is rigour. Analysis prides itself on the highest levels of rigour, whereas applied mathematics doesn’t let rigour alone become a barrier to utility. What are the similarities? In this section we’ll consider a (by now rather cliché) scenario which enables us to motivate the idea of differentiation from a dynamical, physical point of view. While the rigorous definition of the derivative is, today, thought to ‘belong’ to pure maths, the kind of thinking we’ll see here goes back to Newton, the great physicist who pioneered calculus.

Consider a racing car. If we measure how long it takes the car to travel a certain distance, e.g. one lap of a track, we can work out its average speed by dividing the distance by the time. I could ask “How fast was the car going when it crossed the line?”. It is possible that you have never realised how difficult this question is to interpret. As I have just alluded to, it is simple to measure the average speed of the car over some small distance, say a small stretch of track which includes the finishing line. For example, I could measure its average speed between two nearby points: One ten metres before the line and one ten metres beyond the line. However, to even define what we mean by the speed of the car at the instant it crosses the line is non-trivial.

The crucial idea here is as follows: The average speed of the car over the twenty metre interval just mentioned can be thought of as an approximation to the  instantaneous speed of the car when it crosses the line. Roughly speaking, the smaller the interval over which we measure the average speed, the more accurate the  approximation.  Note that I can never actually measure the instantaneous speed of the car as it crosses the line. There is no experiment which will unambiguously give me a number that corresponds to this instantaneous speed. To anyone familiar with calculus, this will all be obvious.


Here’s another way of approaching the same idea: Let’s think about the gradient of the curve of f(x) at the point x. What does one mean by ‘gradient’? Well, the gradient of the curve at x can be defined as the ‘slope’ of the tangent-line to the curve at x, as this picture indicates:

As shown in the picture, we write the gradient of the curve of f at the point (x, f(x)) as f'(x). The point is that without prior thought, it is not clear how one may define the gradient of a curved line. However, it is simple to work out the slope of a straight line and we can actually use this idea to define the gradient of a curve. So, what we’ve done is to build a function. The function f'(x) takes x as input and outputs the gradient of the tangent-line to the graph of f at the point (x, f(x)). This function is called the derivative of f.

The link between the two ideas that have been described is as follows: If the function f(t) were the distance that the car had travelled in time t, then the derivative of f is a function which gives the speed of the car at time t.


What I want to arrive at is an even more geometric point of view for the derivative, so, consider again the picture in the last section. Our observation in this section is that the straight line shown (the tangent) is itself the graph of some function. This leads to another way in which to define the derivative of f: Let each x be mapped to the unique number m(x) such that the graph of the linear function Ly \mapsto f(x) + (y-x)m(x) actually is the tangent-line to the curve at (x, f(x)). Then m is f’.

Note our change of focus: We are now thinking of the tangent as the graph of some function. With this in mind, in order to differentiate the function f at x, i.e. in order to find f'(x), the focus is now on determining the function L whose graph is the tangent-line. And, since the tangent is just a straight line, L can be determined from just two pieces of information: Firstly, the gradient of the line, and secondly, any single point through which it passes. Since we know that it must certainly pass through the point (x , f(x)), determining the function L is the same as determining the slope of the tangent, i.e. f'(x). The important formula here is

L_x(y) = f(x) + f'(x)(y-x),

where now I have given L the subscript x to indicate that it depends on x. Viewing differentiation as the determination of the tangent is the point of view which generalises most appropriately to higher dimensions, and that’s what we’ll try to do next.