Not sure about all of the details, but this is an interesting idea focusing on h...

Not sure about all of the details, but this is an interesting idea focusing on how auto-regressive models can be thought of as learning how to split a difficult task into a series of simpler tasks.

Makes me wonder if that's the magic in denoising autoencoders, too, since they are trained basically to learn how to build an image auto-regressively from more to less noise.