Calculus of variations (idea) by redbaker

Brought to you by the number e and the kind folks at node your homework¹.

The calculus of variations (sometimes called variational calculus) is a powerful mathematical method of finding the proper function to solve a certain problem.

Gee, that's kind of vague, isn't it? Not terribly clear either. Let me try to be a bit more specific.

Say you've got 20 meters of fence, and you want to use it to box in the maximum possible area. (Anyone who ever took elementary calculus surely remembers this sort of problem.)

           10-S
------------------------------
|                            |
|                            |
|                            |
|    Area = (10-S)x S        |  S
|                            |
|                            |
|                            |
------------------------------

Mathematically speaking, what we're looking for here is a number, S, which when plugged into our function for area up there, will give us the maximum possible number in return. Using elementary calculus, you can find the proper numerical value of S that will maximize the area, simply by graphing the area function and finding the spot on the resulting curve where the derivative with respect to S is zero :

       ____________________________
                      @@
                   @@@@@@@@
                  @@@    @@@
                @@@        @@@
               @@            @@
              @@              @@                     
              @@              @@                 
             @@                @@             
             @@                @@      
             @                  @

Anyhow, this is one of the reasons that calculus is so useful -- it turns problems of maximization and minimization like this one into cake. Just take the derivative, find the zero, and you've got the number you want. But what if you have a problem where the solution isn't a number at all? For example, take two points on a plane:

  A                    B
  .            .

What if you want to find the shortest path between them? I know, I know, it's a straight line, but let's pretend we don't know that. The important thing to note here is that the answer to this question is a curve, not a number. There's an infinite number of curves between A and B, and we want to pick the one that is the shortest. This is the kind of problem the calculus of variations is for -- rather than finding numbers which will give us the extremal values of a function, we can now find curves that give us extremal values of a different kind of functions, line length being an example.

"Big deal," I hear you say. "I could have told you that the shortest distance between A and B was a straight line! Any fool could have! We don't need any sort of fancy variational whatchamatisity-thing to figure that out!" True enough. But that's just one application of the calculus of variations. For example, have a look at another pair of points:

Now, say I were to ask you what path from A to B you could slide down the fastest. Not so easy, is it? Is it a straight line? Could be. Or maybe it's something else. But have fun proving it either way without the calculus of variations. That's the power of this method -- it can pick the RIGHT curve from the infinity of curves out there that go from A to B.

Ok, folks, this is the advanced section of the class. All bets are off here unless you've got the equivalent of at least a semester's worth of college calculus.

*steps out of the way of the mad dash for the door*

Still here? Good. The basic entities we'll be dealing with here are called functionals. For our purposes, we can think of functionals as functions that map a function space onto the real numbers. For example, the length of a line is a functional -- given a curve (i.e. a function), it will return a number. Have a look:²

L[y(s)] = ∫_{_A}^{^B}(1+y′²)^½ ds

Simple enough. Now, as we said in the first part of this writeup, we're going to be looking for a way of finding a function y for any given functional J that will give an extremal value on J. How do we do that? By finding a condition on J analogous to the role of y′=0 in differential calculus, of course! Check it out:³

We start by choosing a general form for our functionals and setting some boundary conditions. In this derivation, we'll consider functionals that act on functions which go from point 1 to point 2:

1
.___
    \ y(x)
     \____ . 2

and which take the form: J[y(x)] = ∫_{_x1}^{^x2} F[y(x), y′(x), x] dx

Here J is our functional, y is a function on x, and F is the integrand, which can depend upon any or all of y, y′, and x. (It's worth noting that F is NOT a functional -- F is just our integrand.) Our limits of integration, x1 and x2, are the x-coordinates of the endpoints 1 and 2.

Why this form? This is the simplest form of functional that is still physically useful and mathematically interesting when it comes to the calculus of variations. Many physically significant functionals take this form: the line length functional above, the time functional used in the brachistochrone problem, and many elementary applications of Hamilton's principle, just to name a few. Furthermore, this form is the easiest to represent in HTML. As I learned in the course of writing this node, HTML is not friendly to large fractions. So just trust me and enjoy the ride, because this stuff is pretty cool. I'll say more about other kinds of functionals at the end of the node, I promise.

Back to the mathematics. Presumably the solution to our problem exists. After all, there must be some function with the proper boundary conditions which, when plugged into our functional J, gives us an extremal value.⁴ Let's call that function y. Now, let's consider infinitesimal variations around y; that is, paths that are very similar to y but are not quite y nonetheless. We can represent these infinitesimal variations away from y as the product of a very small real number ε and an arbitrary well-behaved function η(x) with boundary conditions η(x₁) = η(x₂) = 0. In other words:

δy(x) = εη(x)

Or, as a friend of mine once put it:

really tiny difference from the thing we want = a really really small number multiplied by an arbitrary thing that goes to zero at the endpoints.

He's a special sort of fellow. Anyhow, keeping that equation in mind, consider the following expression:

J[y + δy] = J[y + εη]

We know by virtue of our definition of y given earlier that J is extremal for δy = 0. Thus, in the expression on the right-hand-side above, J is extremal when ε=0 for any η(x) meeting the conditions we set out previously. Using our knowledge of basic calculus, we know that this implies the following:

Equation (0):

dJ[y + εη] 
-----------  = 0 for ε=0
   dε

This is our condition on J, the thing we're looking for. But as it stands, it's not terribly useful. It's too abstract. It doesn't help us find y. We have to massage this equation into a more useful form if we want to get anywhere. So now we ask: what IS that derivative up there, on the left-hand-side? Well, let's find out. We know what J[y + εη] is:

J[y + εη] = ∫_{_A}^{^B} F[y + εη, y′ + εη′, x] dx

We now differentiate under the integral sign to get:

Equation (1):

dJ[y + εη]
-------------- = ∫_{_A}^{^B}[ (∂F/∂y)η + (∂F/∂y′)η′] dx
dε

Ok. Have a look at the right-hand-side up there. What do we do with that? Well, first we integrate the right-hand term by parts:

Equation (2):

∫_{_A}^{^B}(∂F/∂y′)η′ dx = η(∂F/∂y′)|_x1^x2 - ∫_{_A}^{^B}[d/dx(∂F/∂y′)]η dx

Substituting the RHS of (2) into the proper place of the RHS of (1), we get:

Equation (3):

η(∂F/∂y′)|_x1^x2 + ∫_{_x1}^{^x2}[∂F/∂y - d/dx(∂F/∂y′)]η dx

Hm. That's interesting. Have a look at that first term above in (3). Since we stipulated at the outset that η(x₁) = η(x₂) = 0, that term must be equal to...well, 0. Isn't that mighty convienient. So now our right-hand-side from (1) can be rewritten as:

Equation (4):

∫_{_x1}^{^x2}[∂F/∂y - d/dx(∂F/∂y′)]η dx

Ok. Now remember our left-hand-side from (1)? What say we consider that where ε=0? By equation (0), we know that the LHS from (1) goes to 0 there, and so the RHS must as well. But what we've got above is equivalent to the RHS of (1), so we now have:

Equation (5):

∫_{_x1}^{^x2}[∂F/∂y - d/dx(∂F/∂y′)]η dx = 0

Now, here's the kicker. This step that we're about to take is what makes this proof cool, in my oh-so-humble opinion. Have another look at (5) there. See that η(x) squeezed in there? Well, up until this point, we have not set any conditions for η other than the very broad ones we gave at the start. Ergo, (5) must hold for ANY η that meets those conditions. In other words, no matter what η is, that integral above still equals zero. How can this be? The only possible explanation is that the rest of the integrand, the thing that η is multiplied by, must be equal to zero.⁵ And so we have:

∂F/∂y - d/dx(∂F/∂y′)=0

Rearranged, we have:

∂F/∂y = d/dx(∂F/∂y′)

This is known as the Euler-Lagrange equation, and it's basically the coolest thing ever. Why? It's succinct, elegant, useful, powerful, and, most importantly, it's fun. It's possible to generalize it for multiple dimensions, multiple variables, multiple derivatives, and one or no endpoints at all. It's at the heart of the Lagrangian and Hamiltonian formulations of classical mechanics, and has been carried over into quantum mechanics as well. Noether's theorem on conservation laws falls directly out of considerations of consequences of this equation and its brethren. It has been called by some the single most useful equation in all of physics. But there's one final question we must answer that is far more important than any of this:

How can we play with it?

To answer this, we'll close this writeup with a simple example of the Euler-Lagrange equation at work: finding the shortest path between two points on a Euclidean plane. We start with the line length functional mentioned before:⁶

L[y(s)] = ∫_{_A}^{^B}(1+y′²)^½ ds

Here, F = (1+y′²)^½. Since there is no explicit mention of y in there, ∂F/∂y = 0, and the Euler-Lagrange equation reduces to:

d/dx(∂F/∂y′) = 0

With our particular F, this becomes:

d/dx(y′(1+y′²)^-½) = 0

Going back to our old and good friend differential calculus, we know this implies:

y′(1+y′²)^-½ = C

where C is a constant. This in turn implies:

y′(x)= ^dy/_dx = A

where A is also a constant. From the first day of calculus class, we know this means:

y(x) = Ax + B

where B is a constant. This is, of course, the equation for a straight line, and we're done. Nifty, no? If you want to see more of the same, the problem mentioned at the end of the first section of this writeup is left as a problem for the proverbial motivated student.⁷. Anyhow, I hope this has left you with a good idea of what the calculus of variations is and why it's so interesting. Come back next week, when we learn about the dangers of llamas and find out what ELSE floats in water aside from very small rocks.

Footnotes:

1: My sources for this writeup are my notes from my analytical mechanics class this semester (Spring 2004) and the textbook for that class, the third edition of Goldstein's Classical Mechanics.

2: Throughout the rest of this node, we'll be using y′ as shorthand for ^dy/_dx.

3: As ariels has pointed out, a disclaimer is in order here. The following is not exactly the height of mathematical rigor, to put it mildly. It's more like a sketch of a proof than a real proof or derivation. It's a "physics proof", if you will -- good enough for the physicists, by and large, but it will make the mathematicians scream.

4: Actually, this isn't true. What is true is that the condition we're about to formulate is necessary and sufficient condition for the existence of a solution. In other words, if there's a solution to the equation we're about to derive, there's a solution to the problem, and they're one and the same.

5: There's something called "the fundamental lemma of variational calculus" hidden in this chain of reasoning. It's worth discussing, and in fact it's worth proving, but this is neither the time nor the place -- it deserves its own node, and someday I will see to it that it gets one if nobody else does first.

6: If you want to know why this is the line length functional, see this node.

7: ~~I plan to place a writeup here in the near future which will work this problem out in its entirety.~~ Wntrmute has beaten me to it! Look here to see him work out the full solution to this problem.

The TI-89 Calculator that froze during the AP Calculus exam	principle of least action	brachistochrone	Lagrangian Mechanics
dynamic programming	Leonhard Euler	functional	Lagrangian
Cycloid	Classical Mechanics	the fundamental lemma of variational calculus	soap bubbles
I want you in my mouth	Hamilton's Principle	Calculus Made Easy	Line integral
I am forced to smoke my cat	partial derivative	Hamiltonian	Differentiating under the integral sign
Anecdotally isomorphic humor	Johann Bernoulli	Arc Length	Noether's theorem

Calculus of variations (idea)

Recommended Reading

About Everything2

User Picks

Editor Picks

New Writeups