> [...] but the core concepts, that actually work, things like batch normalization, gradient descent, dropout, etc are all relatively simple.
They may be simple, but it's controversial why they work. For example dropout is not really used much in recent CNN architectures, and it's just - I don't know - ~5 years old? So people don't even agree what the core concepts are ...
Sure, this is true. I just threw dropout in there without thinking much into it. The point is even if we include the techniques that have been replaced by newer ones, the total number of techniques is small. Also if youre learning deep learning for the first time, understanding why dropout was used, and then how batch normalization came to replace it is key to understanding neural networks. Same can be seen in network architectures, tracing the evolution of CNNs from VGG16 -> ResNet and why Resnet is better exposes one to the vanishing gradient problem, shows how the thought evolution happened, and gives hints to what could be next/builds intuition for the design of deep neural nets
They may be simple, but it's controversial why they work. For example dropout is not really used much in recent CNN architectures, and it's just - I don't know - ~5 years old? So people don't even agree what the core concepts are ...