I assume comments like these, "GPT-4: 8 x 220B experts trained with different da...

		bodecker on July 14, 2023 \| parent \| context \| favorite \| on: FTC investigating ChatGPT over potential consumer ... I assume comments like these, "GPT-4: 8 x 220B experts trained with different data/task distributions and 16-iter inference." https://twitter.com/soumithchintala/status/16712671501017210... https://archive.li/rfFlW I'm not sure the most canonical paper on mixture of experts but here's one possible: https://arxiv.org/pdf/1701.06538.pdf

I think when ppl refer to MoE they are referring generally to the Google GLaM paper actually