Rumours are that the original base model was a 16-way mixture of experts over 1 ...

throwaway48540 · on Sept 21, 2024

Do you need to run all experts at once?

jiggawatts · on Sept 21, 2024

No, but different ones are invoked for each word so typically they’d all be preloaded into VRAM. Also, because of batching, different users would use different experts concurrently.

There’s no detailed information about how GPT4 is hosted, but we can guess from competing models and the leaked info.