Less than you might think! Some of the frontier-advancing training-on-model-outputs ('synthetic data') work just uses other models & automated-checkers to select suitable prompts and desirable subsets of generations.
I find it (very) vaguely like how a person can improve at a sport or an instrument without an expert guiding them through every step up, just by drilling certain behaviors in an adequately-proper way. Training on synthetic data somehow seems to extract a similar iterative improvement in certain directions, without requiring any more natural data. It's somehow succeeding in using more compute to refine yet more value from the original non-synthetic-training-data's entropy.
I find it (very) vaguely like how a person can improve at a sport or an instrument without an expert guiding them through every step up, just by drilling certain behaviors in an adequately-proper way. Training on synthetic data somehow seems to extract a similar iterative improvement in certain directions, without requiring any more natural data. It's somehow succeeding in using more compute to refine yet more value from the original non-synthetic-training-data's entropy.