Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Deep Learning without Backpropagation (iamtrask.github.io)
133 points by williamtrask on March 21, 2017 | hide | past | favorite | 12 comments


The Hinton reference you were trying to find was likely joint work with Timothy Lillicrap. Geoff refers to it, among other places, in the (recorded) lecure from the reminescence symposium for David MacKay. http://divf.eng.cam.ac.uk/djcms2016/#hinton

I don't know that there is an article out for that work yet, but the initial finding by Lilllicrap et al. has been published as a preprint: https://arxiv.org/abs/1411.0247


That's the symposium!!! Thank you so much! I was really hoping someone would remember that part.


Happy to be of help. I think it's very interesting work and I'm glad to see you recognized it in your post.


Is this a little similar to the paired autoencoder/decoders in this one https://arxiv.org/pdf/1609.03971.pdf?


Nope, that paper uses backpropagation, and appears to be aimed at something entirely different.


man, that paper looks cool but i don't have enough of a neuroscience background to interpret the abstract


I loved the synthetic gradients idea when it came out, but the deafening silence in the following months has been a letdown. I was hoping synthetic gradients would be great for parallelizing models on multiple GPUs or improving RNNs.


you know what's funny though...i don't think it's that hard to implement... best i can tell, every framework already has what it needs.

there's just no boilerplate code for people to start with yet.


Has anyone actually managed to get these to work on an imagenet scale?


I don't think it's usable IRL (in its present state), as, according to figures, it doesn't work well even on cifar/mnist. Correct me if I'm wrong, but the value of this paper is that you can decouple a model and train layers asynchronously/independently, just first steps to distributed NN training.


Well, HogWild works well enough. It's another one for the giant "weird shit you can get away with in neural network land" bucket, I think


in the paper, it seemed to work quite well for LSTMS (language modeling), although it is best suited for tasks with very, very large neural networks for which parallelism is beneficial




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: