The major ML conferences all have pretty tight page limits, so more expository s...

The major ML conferences all have pretty tight page limits, so more expository sentences usually get cut. This also means that papers usually only explain how their work is different from previous work, so they assume you are familiar with the papers they cite or are willing to read the cited papers.

This means that people who have an up-to-date knowledge of a given subfield can quickly get a lot out of a new papers. Unfortunately, it also means that it usually takes a pretty decent stack of papers to get up to speed on a new subfield since you have to read the important segments of the commonly cited papers in order to gain the common knowledge that papers are being diffed against.

Traditionally, this issue is solved by textbooks, since the base set of ideas in a given field or subfield is pretty stable. ML has been moving pretty fast in recent years, so there is still a sizable gap between the base knowledge required for productive paper reading and what you can get out of a textbook. For example, Goodfellow et al [1] is a great intro to the core ideas of deep learning, but it was published before transformers were invented, so it doesn’t mention them at all.

[1] https://www.deeplearningbook.org/