> The DNNs assumptions are "The true model can be fit by this network architecture"
Yeah, but that's tautological. What I mean can somebody sit down and for a given DNN architecture, write down (at least approximately) the set of functions that it can learn? Or more importantly, what functions it cannot learn? Or at least, how many bits are assumed and how many bits have to be learned?
I think that is what bothers people about DNNs. I personally think they are sometimes even inefficient - we are learning them much more parameters (bits) than the actual hypothesis space requires.
> What I mean can somebody sit down and for a given DNN architecture, write down (at least approximately) the set of functions that it can learn?
For a two-layer architecture with ReLU activations and n units in the hidden layer, this is the set of piecewise linear continuous functions with n kinks.
Yeah, but that's tautological. What I mean can somebody sit down and for a given DNN architecture, write down (at least approximately) the set of functions that it can learn? Or more importantly, what functions it cannot learn? Or at least, how many bits are assumed and how many bits have to be learned?
I think that is what bothers people about DNNs. I personally think they are sometimes even inefficient - we are learning them much more parameters (bits) than the actual hypothesis space requires.