The Hinton reference you were trying to find was likely joint work with Timothy Lillicrap.
Geoff refers to it, among other places, in the (recorded) lecure from the reminescence symposium for David MacKay.
http://divf.eng.cam.ac.uk/djcms2016/#hinton
I don't know that there is an article out for that work yet, but the initial finding by Lilllicrap et al. has been published as a preprint: https://arxiv.org/abs/1411.0247
I loved the synthetic gradients idea when it came out, but the deafening silence in the following months has been a letdown. I was hoping synthetic gradients would be great for parallelizing models on multiple GPUs or improving RNNs.
I don't think it's usable IRL (in its present state), as, according to figures, it doesn't work well even on cifar/mnist. Correct me if I'm wrong, but the value of this paper is that you can decouple a model and train layers asynchronously/independently, just first steps to distributed NN training.
in the paper, it seemed to work quite well for LSTMS (language modeling), although it is best suited for tasks with very, very large neural networks for which parallelism is beneficial
I don't know that there is an article out for that work yet, but the initial finding by Lilllicrap et al. has been published as a preprint: https://arxiv.org/abs/1411.0247