Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

https://the-decoder.com/gpt-4-architecture-datasets-costs-an...

Not op, but this is where a cheeky google got me.



"The idea is nearly 30 years old and has been used for large language models before, such as Google's Switch Transformer."

Innovation! :)




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: