Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Rumours are that the original base model was a 16-way mixture of experts over 1 TB in size. With quantisation that would fit into 8x 80GB cards = 640 GB total. That is similar in size to competitive models. There’s also a practical benefit of being able to use 100% of a common platform such as a DGX server. Splitting across multiple servers is much more complex and less efficient.


Do you need to run all experts at once?


No, but different ones are invoked for each word so typically they’d all be preloaded into VRAM. Also, because of batching, different users would use different experts concurrently.

There’s no detailed information about how GPT4 is hosted, but we can guess from competing models and the leaked info.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: