# Chain rule

The chain rule in calculus states that if one variable y depends on a second variable u which in turn depends on a third variable x, then the rate of change of y with respect to x can be computed as the product of the rate of change of y with respect to u times the rate of change of u with respect to x.

In Leibniz' symbolism, this can be written as

dy/dx = (dy/du) * (du/dx)

A real world example will show that this rule makes sense: suppose you are climbing up a mountain and you are gaining elevation at a rate of 0.5 kilometers an hour. The temperature is lower at higher elevations; suppose the rate at which it decreases is 6 degrees per kilometer. How fast do you get colder? Well, we have to multiply: 6 degrees per kilometer times 0.5 kilometers per hour makes 3 degrees per hour. Every hour, you'll get three degrees colder. That is the heart of the chain rule.

In the modern treatment, the chain rule is seen as a formula for the derivative of the composition of two functions. Suppose the real-valued function f is defined on some open subset of the real numbers containing the number x, and g is defined on some open subset of the reals containing f(x). If f is differentiable at x and g is differentiable at f(x), then the composition g o f is differentiable at x and the derivative can be computed as

(g o f)'(x) = g'(f(x)) * f '(x)

For example, in order to differentiate

h(x) = sin(x2),

we write h(x) = g(f(x)) with g(u) = sin(u) and f(x) = x2 and the chain rule then yields

h'(x) = cos(x2) * 2x

since g'(u) = cos(u) and f '(x) = 2 x.

The chain rule is a fundamental property of all definitions of derivative and is therefore valid in much more general contexts. For instance, if E, F and G are Banach spaces (which includes Euclidean space) and f : E -> F and g : F -> G are functions, and if x is an element of E such that f is differentiable at x and g is differentiable at f(x), then the derivative of the composition g o f at the point x is given by

Dx(g o f) = Df(x)(g) o Dx(f)

Note that the derivatives here are linear maps and not numbers. If the linear maps are represented as matrices, the composition on the right hand side turns into a matrix multiplication.

A particularly nice formulation of the chain rule can be achieved in the most general setting: let M, N and P be Ck manifolds (or even Banach-manifolds) and let f : M -> N and g : N -> P be differentiable maps. The derivative of f, denoted by df, is then a map from the tangent bundle of M to the tangent bundle of N, and we may write

d(g o f) = dg o df

In this way, the formation of derivatives and tangent bundles is seen as a functor on the category of Ck manifolds.