Agreed. Deep learning has revolutionized AI and anyone hoping to contribute to A...

codingslave · on Nov 14, 2019

I actually don't think mastering deep learning is very difficult. Theres a gazillion papers and ideas floating around, but the core concepts, that actually work, things like batch normalization, gradient descent, dropout, etc are all relatively simple. Most of the complexity comes from second rate scientists pushing their flawed research out into the public in some form of a status game

protomikron · on Nov 14, 2019

> [...] but the core concepts, that actually work, things like batch normalization, gradient descent, dropout, etc are all relatively simple.

They may be simple, but it's controversial why they work. For example dropout is not really used much in recent CNN architectures, and it's just - I don't know - ~5 years old? So people don't even agree what the core concepts are ...

codingslave · on Nov 15, 2019

Sure, this is true. I just threw dropout in there without thinking much into it. The point is even if we include the techniques that have been replaced by newer ones, the total number of techniques is small. Also if youre learning deep learning for the first time, understanding why dropout was used, and then how batch normalization came to replace it is key to understanding neural networks. Same can be seen in network architectures, tracing the evolution of CNNs from VGG16 -> ResNet and why Resnet is better exposes one to the vanishing gradient problem, shows how the thought evolution happened, and gives hints to what could be next/builds intuition for the design of deep neural nets

codetrotter · on Nov 14, 2019

For anyone unfamiliar with all but the most trivial details, do you have some good papers to recommend, to save us from wading through all the rest?

codingslave · on Nov 15, 2019

Get some basics of linear algebra down. Eigenvectors, Eigenvalues. Nail down Matrix Factorization, Principal Components, and the relationship between the two.

Learn softmax, logit function, different activation functions. When to use them. Difference between classification, binary classification, multi label prediction etc. Theyre all similar, just use a few different functions in the neural net

After this, go through some optimization theory and learn the different algorithms for optimizing neural nets, i.e. Adam vs RMSProp.

Then I would just get a list of all the top network architectures, then go through their white papers. Do this chronologically. Start at ~2012. Basically all the network architectures build on each other. So take the first good working deep CNN (alexnet), find out why it worked. Then move to VGG, why did that one work? What problems were solved? then move onwards.

^Do this for computer vision, then again for NLP (Word Vectors) and transformers (BERT, XLNet, etc).

Then youre done.

Theres also GANs etc, but that stuff is extra.

From there, choose whatever specialty you wanna research, and just grab the state of the art.

0-_-0 · on Nov 14, 2019

You can start here: https://github.com/terryum/awesome-deep-learning-papers