Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Some text diffusion models use continuous latent space but they historically haven't done that well. Most the ones we're seeing now typically are trained to predict actual token output that's fed forward into the next time series. The diffusion property comes from their ability to modify previous timesteps to converge on the final output.

I have an explanation about one of these recent architectures that seems similar to what Mercury is doing under the hood here: https://pierce.dev/notes/how-text-diffusion-works/



Oh neat, thanks! The OP is surprisingly light on details on how it actually works and is mostly benchmarks, so this is very appreciated :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: