> In contrast, running the neural network takes 5μs (twenty thousand times faster) with only a small loss in accuracy. This “approximate function inversion via gradients” trick is a very general one that can not only be used with dynamical systems, but also lies behind the fast style transfer algorithm.
Very interesting! Follow up question -- how would you choose the network architecture?
Since the network only acts on a small portion of the entire system, we can constrain it in such a way that dramatically simple NNs work just fine.
`FastChain(FastDense(3,32,tanh), FastDense(32,32,tanh), FastDense(32,2))` (from [0]) would take three inputs from your basis, run it through one hidden layer and provide you with two trained parameters.
This [1] example uses two hidden layers, its one of the more complex solutions I've seen so far. To move to this complexity from a simpler chain, we first make sure our solution is not in a local minima [2], then proceed to increase the parameter count if the NN fails to converge.
Very interesting! Follow up question -- how would you choose the network architecture?