Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What are the geohot leaks?


I assume comments like these, "GPT-4: 8 x 220B experts trained with different data/task distributions and 16-iter inference."

https://twitter.com/soumithchintala/status/16712671501017210... https://archive.li/rfFlW

I'm not sure the most canonical paper on mixture of experts but here's one possible:

https://arxiv.org/pdf/1701.06538.pdf


I think when ppl refer to MoE they are referring generally to the Google GLaM paper actually


https://the-decoder.com/gpt-4-architecture-datasets-costs-an...

Not op, but this is where a cheeky google got me.


"The idea is nearly 30 years old and has been used for large language models before, such as Google's Switch Transformer."

Innovation! :)


George Hotz (pseudo geohot) in his recent interview with Lex Fridman gave some info on the probable structure of gpt4.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: