Not sure about all of the details, but this is an interesting idea focusing on how auto-regressive models can be thought of as learning how to split a difficult task into a series of simpler tasks.
Makes me wonder if that's the magic in denoising autoencoders, too, since they are trained basically to learn how to build an image auto-regressively from more to less noise.
Makes me wonder if that's the magic in denoising autoencoders, too, since they are trained basically to learn how to build an image auto-regressively from more to less noise.