In general yes, you can (and do) shard the model over multiple GPUs. If you want... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		umgefahren 19 days ago \| parent \| context \| favorite \| on: Huawei releases an open weight model trained on Hu... In general yes, you can (and do) shard the model over multiple GPUs. If you want to do that yourself look at DeepSpeed or FSDP . There is a communication overhead though and the speed at which the GPUs can communicate is key. Thats where NVLink comes in btw. So yes, it’s actually what you can and do do. However this limits your ability to iterate on the models quickly and from what I‘ve read a lot of times the foundational labs throw out their models because by the time they are done training they are already outdated.

rfoo 19 days ago | [–]

> the speed at which the GPUs can communicate is key

Guess what a telco equipment company is good at :p

checker659 19 days ago | [–]

NVLink isn’t magic

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact