Anyone with experience running 2 linked consumer GPU's want to chime in how good...

Filligree · on July 26, 2023

You get a fast link between the GPUs, which should help when you’ve got a model split between them.

However, that split isn’t automatic. You can’t expect to run a 40GB model on that, unless perhaps if it’s been designed for that—the way llama.cpp can split a model between the GPU and CPU, for instance.

What you can do without trouble is keep more models loaded, do more things at the same time, and occasionally run the same model at double speed if it batches well.

deaddodo · on July 26, 2023

CUDA multi-GPU with NVLink is pretty well tested with shared memory space. You still want to use NCCL to optimize the allocation, but many CUDA-aware libraries (and their subsequent ML tools) are capable.

pseg134 · on July 26, 2023

This is incorrect if you are talking about 3090 or 3090ti using nvlink.

PeterStuer · on July 26, 2023

You mean those would work like a virtual single GPU with 48GB vram?

Tepix · on July 26, 2023

No. But pytorch will automatically make use of both GPUs and a NVlink bridge if you use its model parallel and distributed data parallel approaches.

marcyb5st · on July 26, 2023

I think you need enterprise grade cards for it to make it work. If I remember correctly consumer cards with nvlink can't share resources to host a 40GB model in vram.