in the paper, it seemed to work quite well for LSTMS (language modeling), althou...

		williamtrask on March 21, 2017 \| parent \| context \| favorite \| on: Deep Learning without Backpropagation in the paper, it seemed to work quite well for LSTMS (language modeling), although it is best suited for tasks with very, very large neural networks for which parallelism is beneficial