Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Anyone with experience running 2 linked consumer GPU's want to chime in how good this works in practice?


You get a fast link between the GPUs, which should help when you’ve got a model split between them.

However, that split isn’t automatic. You can’t expect to run a 40GB model on that, unless perhaps if it’s been designed for that—the way llama.cpp can split a model between the GPU and CPU, for instance.

What you can do without trouble is keep more models loaded, do more things at the same time, and occasionally run the same model at double speed if it batches well.


CUDA multi-GPU with NVLink is pretty well tested with shared memory space. You still want to use NCCL to optimize the allocation, but many CUDA-aware libraries (and their subsequent ML tools) are capable.


This is incorrect if you are talking about 3090 or 3090ti using nvlink.


You mean those would work like a virtual single GPU with 48GB vram?


No. But pytorch will automatically make use of both GPUs and a NVlink bridge if you use its model parallel and distributed data parallel approaches.


I think you need enterprise grade cards for it to make it work. If I remember correctly consumer cards with nvlink can't share resources to host a 40GB model in vram.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: