I hope that all these insights about diffusion model training that have been explored in last few years will be used by Stability AI to train their large text-to-image models, because when it comes to that they just use to most basic pipeline you can imagine with plenty of problems that get "solved" by some workarounds, for example to train SDXL they used the scheduler used by the DDPM paper(2020), epsilon-objective and noise-offset, an ugly workaround that was created when people realized that SD v1.5 wasn't able to generate images that were too dark or bright, a problem related to the epsilon-objective that cause the model to always generate images with a mean close to 0 (the same as the gaussian noise).
A few people have finetuned Stable Diffusion models on v-objective and solved the problem from the root.
A few people have finetuned Stable Diffusion models on v-objective and solved the problem from the root.