Deep Learning without Backpropagation

etiam · on March 21, 2017

The Hinton reference you were trying to find was likely joint work with Timothy Lillicrap. Geoff refers to it, among other places, in the (recorded) lecure from the reminescence symposium for David MacKay. http://divf.eng.cam.ac.uk/djcms2016/#hinton

I don't know that there is an article out for that work yet, but the initial finding by Lilllicrap et al. has been published as a preprint: https://arxiv.org/abs/1411.0247

williamtrask · on March 21, 2017

That's the symposium!!! Thank you so much! I was really hoping someone would remember that part.

etiam · on March 21, 2017

Happy to be of help. I think it's very interesting work and I'm glad to see you recognized it in your post.

ilaksh · on March 21, 2017

Is this a little similar to the paired autoencoder/decoders in this one https://arxiv.org/pdf/1609.03971.pdf?

cing · on March 21, 2017

Nope, that paper uses backpropagation, and appears to be aimed at something entirely different.

williamtrask · on March 21, 2017

man, that paper looks cool but i don't have enough of a neuroscience background to interpret the abstract

visarga · on March 21, 2017

I loved the synthetic gradients idea when it came out, but the deafening silence in the following months has been a letdown. I was hoping synthetic gradients would be great for parallelizing models on multiple GPUs or improving RNNs.

williamtrask · on March 21, 2017

you know what's funny though...i don't think it's that hard to implement... best i can tell, every framework already has what it needs.

there's just no boilerplate code for people to start with yet.

deepnotderp · on March 21, 2017

Has anyone actually managed to get these to work on an imagenet scale?

ar15saveslives · on March 21, 2017

I don't think it's usable IRL (in its present state), as, according to figures, it doesn't work well even on cifar/mnist. Correct me if I'm wrong, but the value of this paper is that you can decouple a model and train layers asynchronously/independently, just first steps to distributed NN training.

curuinor · on March 21, 2017

Well, HogWild works well enough. It's another one for the giant "weird shit you can get away with in neural network land" bucket, I think

williamtrask · on March 21, 2017

in the paper, it seemed to work quite well for LSTMS (language modeling), although it is best suited for tasks with very, very large neural networks for which parallelism is beneficial