On a personal note, I consider the university class where I learned about Lagrangians for the first time to be a pivotal moment in my life. It was the first time I experienced a fundamental sense of awe and beauty about a mathematical construct, in physics or otherwise. I'd thought physics was useful and interesting before that, but seeing derivation of and application the Lagrangian formulation to mechanical systems was, and is, simply gorgeous to me. And as someone who recently began dipping their toe into neural networks, this is really very cool.
Chris, glad you like it. I can relate to that feeling, which I also had as an undergrad. It was a big motivation for this work -- and I think the analogy can go even further. For example, there's a really cool lecun paper about NN training dynamics actually being a lagrangian system: http://yann.lecun.com/exdb/publis/pdf/lecun-88.pdf. Another thing I want to try is literally write down a learning problem where S (the action) is the cost function and then optimize it...
That does seem interesting! After skimming that paper, I think I'm going to need to sit down with it in order to really parse through things, though. Some of the operator combinations seem to be things I haven't worked with jointly before. I'd definitely be interested to see the results of using the action as the cost function, though!
I kind of went the other way. I skipped out on my undergrad classical mechanics class and did the graduate course, taught by a guy from the Landau school, out of volume 1 of Landau & Lifshitz. So in the summer after my 3rd year I spent some quality time with Newton's Principia and then most of the rest of the summer staring off into space as I hooked up everything between the two systems again.
An economics PhD once asked me for help with Langangians, but at the time I was quite occupied with other branches of mathematics. I wonder if it would make sense to revisit it. If I recall, the problem he was looking at was quite standard, and involved perturbations (indeed, like the corona virus) to economical models.
Elegantly deriving the optimal path, which also happens to be what reality usually does? (Because many problems are "easily" transformed into energy minimization problems [principle of least action].)
Conservation laws (and/or the associated symmetries) might be more fascinating, but that all builds on Lagrangian mechanics.
Why would you say that reality "minimizes action" when you could just as easily say it "solves differential equations?" They are two equivalent ways of looking at the same thing, and you can't really tell which one nature does.
"solves differential equations" doesn't tell you anything about which differential equations. "minimizes action" gives you a way to get the differential equations corresponding to pretty much every scenario.
Specifying the action is equivalent to specifying the differential equations. Knowing that you start with a Lagrangian doesn't tell you more than knowing you start with some differential equations.
But are the statements "reality minimizes action" and "reality solves differential equations" actually, literally equivalent to each other? It strikes me that this is not in fact the case. There are solutions to systems describing skyscraper sway that have the building stand at a terrifying lean angle; what is supposed to be selecting the particular solutions that we observe? A purely random walk through the solution space seems unsatisfying as an answer.
They are actually, literally equivalent to each other. Differential equations aren't a "purely random walk," they are a statement of what is about to happen based on what is happening. You have three formalisms: Newtonian (differential equations), Lagrangian (action minimization), and Hamiltonian (differential equations, but derived from a scalar field kind of like in Lagrangian mechanics). They are all different ways of writing down the same thing, and their advantages and disadvantages are related to situational convenience.
Thank you for the response, I find this very interesting.
So is it true to say that the production of a particular solution to a differential equation is a statement about how a system will behave based on its initial conditions, and that the statement captures within it the principle of action minimization by virtue of the fact that it is a derivation of information from natural laws?
"Action is minimized" and "F = ma" are the same in the sense that x+5=0 and x+6=1 are the same. In both cases you can go from one to the other with a sequence of deductions. There's no way to tell which one is the "natural law" because you can take either one as a fact and derive the other as a consequence.
I just assume physicists are weirdos like that, they want reality to do easy and nice things, not math :)
As you see it clicks for many people, and doesn't mean much for others. Back then it was a very welcome click.
After all, it should be possible to find antiderivatives and solve ODEs without doing the dance with substitutions and change of variables, but still, we find it many magnitudes easier than just looking at it and divining the correct solution.
This is a neat idea, and it's interesting to see how well it handles a chaotic system like the double pendulum.
I'd love to see a plot of the analytic Lagrangian vs the numerical one over the two parameter space of the double pendulum. How uniform is the approximation? Since the double pendulum isn't ergodic, I'd be curious to see if there's a correlation between probability of occupying a state and precision there. If there were it could be used as a hint of where to look for non-ergodic behavior in experimental systems.
Another possible fun thing to do with it: imagine you have a chain hanging from two points in the presence of an uneven mass distribution producing gravity. Since you've got the machinery in place for the calculus of variations more generally than just Lagrangian mechanics, you might be able to get the neural net to produce an estimate of the mass distribution from the shape of the chain, which is a toy example of something that might be usable in mineral exploration.
Hi, I work in a highly related field and I'd like to ask a few questions if you wouldn't mind. I found the work very interesting. My understanding is that essentially, because of the way you have formulated the problem, you're able to add a loss to encourage the NN to learn the invarience rather than enforce it strictly. Is this correct?
I've tried implementing something somewhat related where I had a rotationally invariant learning target but was trying to use a feature vector which wasn't. I would randomly rotate my samples and add a loss function on the gradient of the rotation parameters to encourage the gradient w.r.t. rotation to be 0. Maybe in 2D this would have worked but in 3D it seemed to be too difficult for the NN to learn well enough for conservation of energy. It seems your examples use relatively simple model systems as examples. Do you have any insight into how this might work with more complex invariences?
No, in this paper we enforced the Lagrangian structure is enforced as a hard constraint, via the model architecture. It's not just a loss term.
Soft constraints via loss functions can also help, but in my experience they are much less effective than hard constraints. My impression is that this is pretty broadly consistent with the experience of others working in this field.
For neural nets with 3D invariance, I would strongly recommend looking into the literature on "group equivariant convolutions". This has been very active area of research over the past few years, e.g., see the work of Taco Cohen: https://arxiv.org/abs/1902.04615
Since you found a way to model a Lagrangian without an analytical solution, wouldn't it be interesting to throw data from systems we don't usually assume to be, and find out if they could be modelled as one by looking at the error?
I'm a lattice field theorist, exploring how to leverage NN in algorithms for quantum field theory that remain exact even with NNs in them, so that the NNs just provide acceleration.
One annoying thing I've encountered is that I have some symmetries that I cannot figure out how to enforce. For example, if I have two degrees of freedom a and b and know that my physical system has a symmetry under exchange of a and b. Suppose I want to train a network to compute something in my system. For each configuration of my system I can train on (a,b) and (b,a). But the order in which I feed those as training matters, so that the network only has _approximate_ exchange symmetry, rather than exact.
You can enforce exact symmetry in neural networks with the right sort of model structure. For permutation invariance in particular, take a look at Deep Sets: https://arxiv.org/abs/1703.06114
Not sure I followed that, but if (a, b) is a data tuple you can enforce symmetry by ensuring that if (a, b) is in a batch, (b, a) is as well. That is, calculate the gradient with both of them simultaneously.
For generically enforcing symmetries, variational autoencoders are the best technique I'm aware of. You can impose any symmetry you like in your generative model. Of course it's still approximate though.
I'd be interested to hear more about your problem, send me an email.
Interesting, coming from physics rather than an ML background what does it mean "practically" to learn a symmetry of a system? Is it the quantity <=> Noether's theorem being constant?
Having skimmed the research paper and done some work with both dynamics and ML, my interpretation of their statement is the following:
You want to learn a function that represents the dynamics of your system, either as a function of the system state or some output like a picture of the system. If you just apply some NN technique directly, this is possible but will require a lot of data since the NN doesn't have any knowledge of physics. If you use their system, you are trying to learn the Lagrangian of the system, which contains information on e.g. its symmetries, and bakes in physical knowledge into the learning problem at hand. As a result, less data is needed to learn the system dynamics.
Yup, but the computer isn't _learning_ it, it's already enforced by the fact that what the computer is learning is a time-independent Lagrangian.
(I suppose it's learning a symmetry in the following sense: just _what_ is conserved depends on what the Lagrangian is, and so as it's learning the dynamics it's also learning what the energy is that it should be conserving. But at every point in the training process, there's _some_ thing, which we might as well call "energy", which its model conserves.)
Yeah, I was thinking about symmetries <=> conservation laws because of Noether's theorem. Think of regular NN training as not having any symmetries...since they aren't baked into the model. But we can let the model learn symmetries <=> conservation laws by adding the Euler-Lagrange constraint to the forward pass of a NN.
As a high school student with an admiration of mathematics (and therefore of physics, ML, and whatnot :D) I must thank the author for this.
Glancing over the paper I understand little, there’s too much math I don’t know (yet— starting uni this year— I promise I’ll get there) but the application is absolutely beatiful, as I have taken physics for almost two years now, Appendix B made my day.
At first glance, S seems like an arbitrary combination of energies. But it has one remarkable property.
It turns out that for all possible paths between x0 and x1, there is only one path that gives a stationary value of S. Moreover, that path is the one that nature always takes.
I think it may be an issue with font rendering on different operating systems, screen sizes, resolutions, etc.. The webpage looks quite different on my phone vs. tablet vs. laptop. It's least readable on my tablet (Microsoft Surface 2) which has an insanely high dpi.
I know about the reader mode, and will use it if necessary. But this will loose me all pictures, gif/js animations, and whatnot. I prefer to just have readable websites. I know and accept that some people don't care, even if a more readable site wouldn't hurt their enjoyment the least. Still I prefer sites that work appropriately when I zoom, and that abstain from such visual hostilities.