People have mentioned the discrete - continuous tradeoff. One way to bridge that gap would be to use https://arxiv.org/abs/1806.07366 - they draw an equivalence between vanilla (FC layer) neural nets of constant width with differential equations, and then use a differential equation solver to "train" a "neural net" (from what I remember - it's been years since that paper...).
Another approach might be to take an information theoretic view with the infinite-width finite-entropy nets.
Another approach might be to take an information theoretic view with the infinite-width finite-entropy nets.