The computational cost of "training" a polynomial would be the same as just one ...

ChrisRackauckas · on Aug 27, 2020

There is a saying in mathematics that the fastest way to a solution is through the complex plane. This was discovered because a lot of proofs are nicer by doing analytical continuation and analyzing the properties of the continued function. Complex-step differentiation is another example of this.

In some sense, something similar applies to neural networks in this context. Have you done a lot of fitting of classical basis methods inside of differential equations? They are very prone to local minima, so direct training of polynomials inside of a differential equation is rather hard. But through neural network magic, somehow related to [1], which essentially state that local minima are the global minima on large enough neural networks. So this lets you get pretty lazy and just do local optimization to find missing functions, and then sparsify to polynomials later, in a way where the optimization is better behaved than going directly to polynomials. The DiffEqFlux library has both approaches available, so you can try both side by side and see the difference. From years of experience doing the former, the latter is quite a breath of fresh air.

   [1] https://arxiv.org/abs/1412.0233

freemint · on Aug 27, 2020

> The computational cost of "training" a polynomial would be the same as just one iteration of the training algorithm used by typical NNs.

That statement depends heavily on the dimensionality of the problem. Polynomials also have huge problems with discontinuities (even in some higher order derivative) sometimes would require an infinite number of polynomials to smooth out the errors around the discontinuities. (try to fit the Integral of |x| with polynomials)

Fear of NN in control is justified if the networks are poorly understood.

srean · on Aug 27, 2020

Not just that, they tend to blow up when one extrapolates 'too' far from data. This can be controlled for using other basis functions, for example functions in a reproducing kernel Hilbert space, radial basis functions. It is best to choose the basis based upon the data (as RBFs and RKHS bases do) and not chose a basis independent of the data. This applies for polynomials too, choosing a polynomial basis that's orthogonal with respect to the data distribution makes computations much better behaved -- otherwise its common to run into ill conditioned problems that are very sensitive to noise in the data.

Libbum · on Aug 27, 2020

I certainly agree with the NNs are used as hammers point. Until coming across the UODE concept I was of the opinion they were more parlour trick than anything useful. Here though, I could see some validity.

These comments are appreciated - I think a discussion like this is lacking in the SciML docs (or at least not visible enough). Will have a chat with some of the devs and see if there's something we can add.