The restriction to a tiny GPU workload is increasingly wrong for assessments.
GPU compute stacks are increasingly geared towards multi-gpu/multi-node & streaming, esp. given the crazy bandwidth they're now built for (2TB/s for a dgx2 node?). Likewise, per-GPU memory and per-GPU-node memory is going up nicely each year (16-24GB/GPU, and 100GB-512GB/node with TBs connected same-node). Network is more likely to become the bottleneck if you saturate that, not your DB :)
Though I like to do mostly single gpu streaming in practice b/c I like not having to think about multinode and they're pretty cheap now :)
GPU compute stacks are increasingly geared towards multi-gpu/multi-node & streaming, esp. given the crazy bandwidth they're now built for (2TB/s for a dgx2 node?). Likewise, per-GPU memory and per-GPU-node memory is going up nicely each year (16-24GB/GPU, and 100GB-512GB/node with TBs connected same-node). Network is more likely to become the bottleneck if you saturate that, not your DB :)
Though I like to do mostly single gpu streaming in practice b/c I like not having to think about multinode and they're pretty cheap now :)