Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Lagrangian Neural Networks (greydanus.github.io)
246 points by hardmaru on March 12, 2020 | hide | past | favorite | 49 comments


On a personal note, I consider the university class where I learned about Lagrangians for the first time to be a pivotal moment in my life. It was the first time I experienced a fundamental sense of awe and beauty about a mathematical construct, in physics or otherwise. I'd thought physics was useful and interesting before that, but seeing derivation of and application the Lagrangian formulation to mechanical systems was, and is, simply gorgeous to me. And as someone who recently began dipping their toe into neural networks, this is really very cool.


Chris, glad you like it. I can relate to that feeling, which I also had as an undergrad. It was a big motivation for this work -- and I think the analogy can go even further. For example, there's a really cool lecun paper about NN training dynamics actually being a lagrangian system: http://yann.lecun.com/exdb/publis/pdf/lecun-88.pdf. Another thing I want to try is literally write down a learning problem where S (the action) is the cost function and then optimize it...


That does seem interesting! After skimming that paper, I think I'm going to need to sit down with it in order to really parse through things, though. Some of the operator combinations seem to be things I haven't worked with jointly before. I'd definitely be interested to see the results of using the action as the cost function, though!


I kind of went the other way. I skipped out on my undergrad classical mechanics class and did the graduate course, taught by a guy from the Landau school, out of volume 1 of Landau & Lifshitz. So in the summer after my 3rd year I spent some quality time with Newton's Principia and then most of the rest of the summer staring off into space as I hooked up everything between the two systems again.

Man, that was a great summer.


Oh god Landau & Lifshitz, that wakes memories. That was hardcore


Wow, that's really inspiring. I also love beauty. Can anyone recommend a way for someone with undergrad level math to learn about Lagrangians?



An economics PhD once asked me for help with Langangians, but at the time I was quite occupied with other branches of mathematics. I wonder if it would make sense to revisit it. If I recall, the problem he was looking at was quite standard, and involved perturbations (indeed, like the corona virus) to economical models.


A "perturbation" is usually short for a "small perturbation," which SARS-CoV-2 is definitely not. ;)


Maybe usually, but iirc in his case he specifically said "major" events and recourse to normal after that.


What's so great about it? I had that class too and I didn't think it was so hot. Maybe I missed something.


Elegantly deriving the optimal path, which also happens to be what reality usually does? (Because many problems are "easily" transformed into energy minimization problems [principle of least action].)

Conservation laws (and/or the associated symmetries) might be more fascinating, but that all builds on Lagrangian mechanics.


Why would you say that reality "minimizes action" when you could just as easily say it "solves differential equations?" They are two equivalent ways of looking at the same thing, and you can't really tell which one nature does.


"solves differential equations" doesn't tell you anything about which differential equations. "minimizes action" gives you a way to get the differential equations corresponding to pretty much every scenario.


Specifying the action is equivalent to specifying the differential equations. Knowing that you start with a Lagrangian doesn't tell you more than knowing you start with some differential equations.


But are the statements "reality minimizes action" and "reality solves differential equations" actually, literally equivalent to each other? It strikes me that this is not in fact the case. There are solutions to systems describing skyscraper sway that have the building stand at a terrifying lean angle; what is supposed to be selecting the particular solutions that we observe? A purely random walk through the solution space seems unsatisfying as an answer.


They are actually, literally equivalent to each other. Differential equations aren't a "purely random walk," they are a statement of what is about to happen based on what is happening. You have three formalisms: Newtonian (differential equations), Lagrangian (action minimization), and Hamiltonian (differential equations, but derived from a scalar field kind of like in Lagrangian mechanics). They are all different ways of writing down the same thing, and their advantages and disadvantages are related to situational convenience.


Thank you for the response, I find this very interesting.

So is it true to say that the production of a particular solution to a differential equation is a statement about how a system will behave based on its initial conditions, and that the statement captures within it the principle of action minimization by virtue of the fact that it is a derivation of information from natural laws?


"Action is minimized" and "F = ma" are the same in the sense that x+5=0 and x+6=1 are the same. In both cases you can go from one to the other with a sequence of deductions. There's no way to tell which one is the "natural law" because you can take either one as a fact and derive the other as a consequence.


Interesting - your response makes sense. Thanks again for the information.


I just assume physicists are weirdos like that, they want reality to do easy and nice things, not math :)

As you see it clicks for many people, and doesn't mean much for others. Back then it was a very welcome click.

After all, it should be possible to find antiderivatives and solve ODEs without doing the dance with substitutions and change of variables, but still, we find it many magnitudes easier than just looking at it and divining the correct solution.


I found the previous Hamiltonian Neural Networks [0]. If the authors are here, I'd be interested to use [1] for a version with dissipation.

[0] https://greydanus.github.io/2019/05/15/hamiltonian-nns/

[1] https://arxiv.org/abs/1902.04598


This is a neat idea, and it's interesting to see how well it handles a chaotic system like the double pendulum.

I'd love to see a plot of the analytic Lagrangian vs the numerical one over the two parameter space of the double pendulum. How uniform is the approximation? Since the double pendulum isn't ergodic, I'd be curious to see if there's a correlation between probability of occupying a state and precision there. If there were it could be used as a hint of where to look for non-ergodic behavior in experimental systems.

Another possible fun thing to do with it: imagine you have a chain hanging from two points in the presence of an uneven mass distribution producing gravity. Since you've got the machinery in place for the calculus of variations more generally than just Lagrangian mechanics, you might be able to get the neural net to produce an estimate of the mass distribution from the shape of the chain, which is a toy example of something that might be usable in mineral exploration.


Hi, I work in a highly related field and I'd like to ask a few questions if you wouldn't mind. I found the work very interesting. My understanding is that essentially, because of the way you have formulated the problem, you're able to add a loss to encourage the NN to learn the invarience rather than enforce it strictly. Is this correct?

I've tried implementing something somewhat related where I had a rotationally invariant learning target but was trying to use a feature vector which wasn't. I would randomly rotate my samples and add a loss function on the gradient of the rotation parameters to encourage the gradient w.r.t. rotation to be 0. Maybe in 2D this would have worked but in 3D it seemed to be too difficult for the NN to learn well enough for conservation of energy. It seems your examples use relatively simple model systems as examples. Do you have any insight into how this might work with more complex invariences?


No, in this paper we enforced the Lagrangian structure is enforced as a hard constraint, via the model architecture. It's not just a loss term.

Soft constraints via loss functions can also help, but in my experience they are much less effective than hard constraints. My impression is that this is pretty broadly consistent with the experience of others working in this field.

For neural nets with 3D invariance, I would strongly recommend looking into the literature on "group equivariant convolutions". This has been very active area of research over the past few years, e.g., see the work of Taco Cohen: https://arxiv.org/abs/1902.04615


Hmm, I must be misunderstanding the implementation somehow. Thanks for the tip. I’ll have to dig a bit deeper to understand this work here.


Since you found a way to model a Lagrangian without an analytical solution, wouldn't it be interesting to throw data from systems we don't usually assume to be, and find out if they could be modelled as one by looking at the error?


I'm a lattice field theorist, exploring how to leverage NN in algorithms for quantum field theory that remain exact even with NNs in them, so that the NNs just provide acceleration.

One annoying thing I've encountered is that I have some symmetries that I cannot figure out how to enforce. For example, if I have two degrees of freedom a and b and know that my physical system has a symmetry under exchange of a and b. Suppose I want to train a network to compute something in my system. For each configuration of my system I can train on (a,b) and (b,a). But the order in which I feed those as training matters, so that the network only has _approximate_ exchange symmetry, rather than exact.

Is there a way around this inexactness?


You can enforce exact symmetry in neural networks with the right sort of model structure. For permutation invariance in particular, take a look at Deep Sets: https://arxiv.org/abs/1703.06114


Not sure I followed that, but if (a, b) is a data tuple you can enforce symmetry by ensuring that if (a, b) is in a batch, (b, a) is as well. That is, calculate the gradient with both of them simultaneously.

For generically enforcing symmetries, variational autoencoders are the best technique I'm aware of. You can impose any symmetry you like in your generative model. Of course it's still approximate though.

I'd be interested to hear more about your problem, send me an email.


Interesting, coming from physics rather than an ML background what does it mean "practically" to learn a symmetry of a system? Is it the quantity <=> Noether's theorem being constant?


Having skimmed the research paper and done some work with both dynamics and ML, my interpretation of their statement is the following:

You want to learn a function that represents the dynamics of your system, either as a function of the system state or some output like a picture of the system. If you just apply some NN technique directly, this is possible but will require a lot of data since the NN doesn't have any knowledge of physics. If you use their system, you are trying to learn the Lagrangian of the system, which contains information on e.g. its symmetries, and bakes in physical knowledge into the learning problem at hand. As a result, less data is needed to learn the system dynamics.


I don't know why symmetries related with physics or Lagrangian of the system? Could you give me more specific instructions or some reading materials?


So it boils down using suitable biases/constraints, no?

I think more people need to learn to see the positive side of TANSTAAFL...


They are fitting a Lagrangian that doesn't depend on time, so conservation of energy is wired into the system. It's not learning a symmetry.


Conservation of energy is a symmetry.


Yup, but the computer isn't _learning_ it, it's already enforced by the fact that what the computer is learning is a time-independent Lagrangian.

(I suppose it's learning a symmetry in the following sense: just _what_ is conserved depends on what the Lagrangian is, and so as it's learning the dynamics it's also learning what the energy is that it should be conserving. But at every point in the training process, there's _some_ thing, which we might as well call "energy", which its model conserves.)


Yeah, I was thinking about symmetries <=> conservation laws because of Noether's theorem. Think of regular NN training as not having any symmetries...since they aren't baked into the model. But we can let the model learn symmetries <=> conservation laws by adding the Euler-Lagrange constraint to the forward pass of a NN.


As a high school student with an admiration of mathematics (and therefore of physics, ML, and whatnot :D) I must thank the author for this.

Glancing over the paper I understand little, there’s too much math I don’t know (yet— starting uni this year— I promise I’ll get there) but the application is absolutely beatiful, as I have taken physics for almost two years now, Appendix B made my day.


Excerpt:

The Principle of Least Action

[...]

At first glance, S seems like an arbitrary combination of energies. But it has one remarkable property.

It turns out that for all possible paths between x0 and x1, there is only one path that gives a stationary value of S. Moreover, that path is the one that nature always takes.


Why aren't all scientific papers written this simply?


Great post. This is why I love hacker news


For God's sake please use another font. I don't understand this minimalism trend that emphasizes on thin fonts. Stop it.


Counterpoint: I think the typography is wonderful and highly readable as is.


I think it may be an issue with font rendering on different operating systems, screen sizes, resolutions, etc.. The webpage looks quite different on my phone vs. tablet vs. laptop. It's least readable on my tablet (Microsoft Surface 2) which has an insanely high dpi.


Author with the edgy fonts here. I think you're right ben. This gives me motivation support mobile/tablets/etc., so that'll happen soon


Counterpoint: I have bad eyesight and I want to curse every time I see these very thin fonts. Or low contrast color schemes, for that matter.


Firefox has a reader mode which is a little button next to the address bar. My eyesight is fine but I use it on websites like these.


Thanks, and +1 for good faith effort.

I know about the reader mode, and will use it if necessary. But this will loose me all pictures, gif/js animations, and whatnot. I prefer to just have readable websites. I know and accept that some people don't care, even if a more readable site wouldn't hurt their enjoyment the least. Still I prefer sites that work appropriately when I zoom, and that abstain from such visual hostilities.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: