Part 2 is going to cover the necessary mathematics for Lagrangian Mechanics. At this point I’m assuming that you’ve already covered everything in calculus 1-3, and linear algebra. The goal of this post is to introduce an area of math called variational calculus.
The Essentials
A lot of what we do in Lagrangian Mechanics is going to be based around variational calculus, a branch of mathematical analysis that deals with optimizing functionals, or functions that take other functions as inputs and return a scalar value. For example,
\[ S[f] = \int_0^1 f'(x)^2 dx \]
is a function, which for any well-behaved function \(f\), gives us a real number \(S[f]\). Here \(f'(x) = \frac{df}{dx}\) evaluated at \(x\), and I use the square bracket notation to emphasize that the functional \(S\) depends on the choice of function.
I think learning about functionals comes easiest when we first work on a concrete example, and then generalize from there. In Newtonian mechanics one of the first things we learn is how to model an object moving in a line, keeping in that spirit we’ll look a somewhat trivial derivation the proves what shortest distance between two points in a plane is. We’ll begin by showing that the distance between two points can be expressed as a functional, followed by using it to show that the shortest distance is a straight line.
/Suppose that we have two points \(P_a\) and \(P_b\) with coordinates \((x_a, y_a)\) and \((x_b, y_b)\) respectively. Furthermore, allow \(y = y(x)\) to be a smooth curve that joins \(P_a\) to \(P_b\), where \(x_a \leq x \leq x_b\).
And using the formula for arc length, we can define our distance, \(S[y]\), as:
\[ S[y] = \int_{x_a}^{x_b} \sqrt(1+y'(x))^2) dx \]
To be clear, our first example of a functional maps functions \(y(x)\) that satisfy \(y(x_a) = y_a\) and \(y(x_b) = y_b\) to the length of the curve \(y = y(x)\) between \(x = a\) and \(x = b\). This means that to find the path with the shortest distance between \(P_a\) and \(P_b\), we must find \(y = y(x)\) that will make \(S[y]\) as small as possible. To do this, we’re going to borrow the idea of a stationary point from ordinary calculus.
As a reminder, a stationary point \(x\) of a function \(y(x)\) is a point at which \(y'(x) = 0\), or a point where the function is neither increasing nor decreasing. Once you have that point, you can then determine if the function has a maximum, minimum, or inflection point there.To be little loose, the way we would find a mimima is finding \(x\) that satisfy:
\[ \frac{d}{dx} f(x) = 0 \text{ and }f”(x) \gt 0 \]
But how can we use that? Let \(g(x)\) be another smooth function that satisfies \(g(x_a) = g(x_b) = 0\). Then, we can define a new path by \(\tilde{y}(x, \epsilon) = y(x) + \epsilon g(x)\) where \(\epsilon\) is any real number. This means that for each value of \(\epsilon\), there is a new path \(\tilde{y}\) that passes through \(P_a\) and \(P_b\).
Now we can easily see that the length of our new path \(bar{y}\) is given by:
\[ S[y+ \epsilon g] = \int_{x_a}^{x_b} \sqrt(1+((y + \epsilon g)’)^2) dx \]
\[ = \int_{x_a}^{x_b} \sqrt(1+(y’ + \epsilon g’)^2) dx \]
That is, for fixed \(y\) and \(g\), \(S[y + \epsilon g]\) is a real valued function taking in \(\epsilon\) and returning another real number. We can use this by realizing that \(y\), our desired minimal path must take its minimum value at \(\epsilon = 0\), as any deviation from this path would result in a longer distance. Mathematically, this translates to:
\[ \frac{d}{d \epsilon} S[y + \epsilon g] \Big|_{\epsilon = 0} = 0 \]
for all functions \(g(x)\) that satisfy \(g(x_a) = g(x_b) = 0\). A function \(y(x)\) that satisfies this equation is said to be a stationary path of \(S\) (sometimes called stationary curve, stationary function, or even by saying \(S[y]\) is stationary). Hopefully going back to look at the definition of a stationary point can provide intuition. Although we have derived this from a particular use case, the definition of stationary paths generalized to all functionals that we’ll see.
So now lets use this to derive the formula for the shortest path between two points on a plane
\[ \frac{d}{d \epsilon} S[y + \epsilon g] \Big|_{\epsilon = 0} =\left( \frac{d}{d \epsilon} \int_{x_a}^{x_b} \sqrt(1 + (y’ + \epsilon g’)^2) dx \right) \Big|_{\epsilon = 0} \]
\[ = \int_{x_a}^{x_b} \left( \frac{d}{d \epsilon} \sqrt(1 + (y’ + \epsilon g’)^2) \right) \Big|_{\epsilon = 0} dx \]
\[ = \int_{x_a}^{x_b} \left( \frac{(y’ + \epsilon g’)g’}{ \sqrt(1 + (y’ + \epsilon g’)^2) }\right) \Big|_{\epsilon = 0} dx \]
\[ = \int_{x_a}^{x_b} \frac{y’g’}{ \sqrt(1 + (y’)^2)}dx \]
If \(y(x)\) is a stationary path of \(S\), then it follows that
\[ \int_{x_a}^{x_b} \frac{y’g’}{ \sqrt(1 + (y’)^2)}dx = 0 \]
for all functions \(g(x)\) for which \(g(a) = g(b) = 0\). To solve, we can integrate by parts and get
\[ \int_{x_a}^{x_b} \frac{y'(x)}{ \sqrt(1 + (y'(x))^2)}g(x)dx = 0 \]
Now if \(S[y]\) is stationary, by definition this integral vanishes for all functions \(g(x)\). Because \(g\) is arbitrary, this can only happen if
\[ \frac{y'(x)}{ \sqrt(1 + (y'(x))^2)} = 0 \]
Integrating both sides with respect to \(x\) gives
\[ \frac{y'(x)}{ \sqrt(1 + (y'(x))^2)} = \alpha \]
for some constant \(\alpha\). Now the left hand side can only be constant if
\[ y'(x) = m \]
where \(m\) is some function of \(\alpha\)-another constant. Integration now gives the general solution
\[ y(x) = mx + c \]
for yet another constant \(c\). The constants \(m\) and \(c\) are determined by the condition that the line passes through \(P_a\) and \(P_b\), so
\[ y(x) = \frac{y_b – y_a}{x_b – x_a}x + \frac{y_a x_b – y_b x_a}{x_b – x_a} \]
Showing that the functional \(S\) is stationary along the straight line joining \(P_a\) and \(P_b\). Similary to the stationary point, to show that the solution is the minimum, we would have to show that
\[ \frac{d^2}{d \epsilon^2} S[y + \epsilon g] \Big|_{\epsilon = 0} \gt 0 \]
But we neglect that for now. But what if there was a way to make this whole process simpler? A way to skip most of the leg work in the process.
Euler-Lagrange Equation
To this end we will derive the Euler-Lagrange equation, which allows us to find the stationary path of a wide class of functionals without going through the same steps as before. Broadly speaking, all of these functionals can be written in the form:
\[ S[y] = \int_{x_a}^{x_b} F(x, y, y’) dx, y(x_a) = y_a, y(x_b) = y_b \]
where \(F(x, y, y’)\) is an expression containing at least one of the functions \(y\) and \(y’\), and possibly also the variable \(x\). To find the \(y(x)\) that makes the functional stationary, we proceed in exactly the same manner as before. So lets assuming that \(y(x)\) is the required stationary path, and then define a neighboring path \(\tilde{y}(x, \epsilon) = y(x) + \epsilon g(x)\) where \(g(x_a) = g(x_b) = 0\). Now, the value of the functional along the neighboring path is given by
\[ S[\tilde{y}] = \int_{x_a}^{x_b} F(x, \tilde{y}, \tilde{y}’) dx \;\;\;\text{where}\;\;\; \tilde{y}’ = \frac{d\tilde{y}}{dx} \]
Similarly to before, we will arrive to:
\[ \frac{d}{d\epsilon}S[\tilde{y}] \Big|_{\epsilon = 0} = \int_{x_a}^{x_b} \left( \frac{d}{d\epsilon}F(x, \tilde{y}, \tilde{y}’) \right) \Big|_{\epsilon = 0} dx \]
Since \(\tilde{y}\) and \(\tilde{y}’\) both depend on \(\epsilon\), but \(x\) does not, we can apply the chain rule to give
\[ \frac{d}{d\epsilon}F(x,\tilde{y},\tilde{y}’] \Big|_{\epsilon = 0} = \left( \frac{\partial F}{\partial \tilde{y}} \frac{d \tilde{y}}{d \epsilon} + \frac{\partial F}{\partial \tilde{y}’} \frac{d \tilde{y}’}{d \epsilon} \right) \Big|_{\epsilon = 0} \]
However, \(\tilde{y} = y + \epsilon g\), so \(\tilde{y}’ = y’ + \epsilon g’\), so:
\[ \frac{d}{d\epsilon}F(x,\tilde{y},\tilde{y}’] \Big|_{\epsilon = 0} = \left( \frac{\partial F}{\partial \tilde{y}} g + \frac{\partial F}{\partial \tilde{y}’} g’ \right) \Big|_{\epsilon = 0} \]
And since \(F = F(x, \tilde{y}, \tilde{y}’\) depends on \(\epsilon\) while \(g, = g(x)\) is independent, as \(\epsilon \rightarrow 0\), we have \(\tilde{y} \rightarrow y\) and \(\tilde{y} \rightarrow y’\). Therefore:
\[ \frac{d}{d\epsilon}F(x,\tilde{y},\tilde{y}’ \Big|_{\epsilon = 0} = \frac{\partial F}{\partial y} g + \frac{\partial F}{\partial y’} g’ \]
where now on the right hand side \(F = F(x,y,y’)\). This gives us a right hand side that is independent of \(\epsilon\). Substituting this into what we got above gives us:
\[ \frac{d}{d\epsilon}S[\tilde{y}] \Big|_{\epsilon = 0} = \int_{x_a}^{x_b} \left( g(x)\frac{\partial F}{\partial y} + g'(x) \frac{\partial F}{\partial y’} \right) dx \]
The second term can be rewritten using integration by parts to give\[ \int_{x_a}^{x_b} g'(x) \frac{\partial F}{\partial y’} dx = \left[g(x) \frac{\partial F}{\partial y’} \right]_{x_a}^{x_b} – \int_{x_a}^{x_b} g(x) \frac{d}{dx}\left(\frac{\partial F}{\partial y’}\right)dx \]
Since \(g(x_a) = g(x_b) = 0\), the first term on the right hand side vanished, giving us
\[ \frac{d}{d\epsilon}S[\tilde{y}] \Big|_{\epsilon = 0} = \int_{x_a}^{x_b} \left[ g(x)\frac{\partial F}{\partial y} – g(x) \frac{d}{dx} \frac{\partial F}{\partial y’} \right] dx \]
\[ \frac{d}{d\epsilon}S[\tilde{y}] \Big|_{\epsilon = 0} = – \int_{x_a}^{x_b} g(x) \left[ \frac{d}{dx} \frac{\partial F}{\partial y’} – \frac{\partial F}{\partial y} \right] dx \]
With a similar logic to our initial analysis, if \(S[y]\) is stationary, this integral must vanish, which can only occur if:
\[ \frac{d}{dx} \frac{\partial F}{\partial y’} – \frac{\partial F}{\partial y} = 0\;\;\; \text{where}\;\;\; y(x_a) = y_a, y(x_b) = y_b \]
This is known as the Euler-Lagrange equation and will become a very useful tool in our future work. Solutions of the Euler-Lagrange equation are stationary paths of the functional \(S[y]\) in our initial equation. When applied to a variational problem, the Euler-Lagrange equation usually produces a second-order differential equatino, which must be solved subject to the boundary conditions in order to obtain the stationary path.
Before moving on, it is worthwhile to draw attention to two special cases of the Euler-Lagrange Equation. The first, is useful if a functional has the form:
\[ S[y] = \int_{x_a}^{x_b} F(x, y’) dx \]
in which the integrand does not depend explicitly on \(y\), then the Euler-Lagrange equation can be integrated to give
\[ \frac{\partial F}{\partial y’} = C \]
for some constant of integration, \(C\). The second case is known as the Beltrami Identity, as is where the functional has the form
\[ S[y] = \int_{x_a}^{x_b} F(y, y’) dx \]
in this case the integrand does not explicitly depend on \(x\), or in other words, \(\partial f / \partial x = 0\). Here, this reduces to the first integral of the Euler-Lagrange, or
\[ y’ \frac{\partial F}{\partial y’} – F = C \]
and the stationary path of the functional \(S\) is determined by solving the resulting first-order equation.
Now that we’ve gotten an introduction to functionals, if you’d like to see some problems using them go read “Lagrangian Mechanics pt 2.5”.