What are the geohot leaks?

bodecker · on July 14, 2023

I assume comments like these, "GPT-4: 8 x 220B experts trained with different data/task distributions and 16-iter inference."

I'm not sure the most canonical paper on mixture of experts but here's one possible:

arthurcolle · on July 14, 2023

I think when ppl refer to MoE they are referring generally to the Google GLaM paper actually

ZunarJ5 · on July 14, 2023

Not op, but this is where a cheeky google got me.

1vuio0pswjnm7 · on July 14, 2023

"The idea is nearly 30 years old and has been used for large language models before, such as Google's Switch Transformer."

Innovation! :)

ta988 · on July 14, 2023

George Hotz (pseudo geohot) in his recent interview with Lex Fridman gave some info on the probable structure of gpt4.