SIMD coders are doing DirectX / HLSL and/or Vulcan in practice. Or CUDA of cours...

jsheard · on Nov 4, 2024

> CPUs sit at an awkward point, you need a small dataset that suffers major penalties for CPU/GPU transfers.

Or a dataset so large that it won't fit in the memory of any available GPU, which is the main reason why high-end production rendering (think Pixar, ILM, etc) is still nearly always done on CPU clusters. Last I heard their render nodes typically had 256GB RAM each, and that was a few years ago so they might be up to 512GB by now.

dragontamer · on Nov 4, 2024

Yes but GPU clusters with 256++ GB of VRAM is possible and superior in bandwidth thanks to NVSwitch.

I'd say CPUs still have the RAM advantage but closer to 1TB+ of RAM, where NVSwitch no longer scales to. CPUs with 1TB of RAM are a fraction of the cost too, so price/performance deserves a mention.

------

Even then, PCIe is approaching the bandwidth of RAM (latency remains a problem of course).

For Raytracing in particular, certain objects (bigger background objects or skymaps) have a higher chance of being hit.

There are also OctTrees where you can have rays bounce inside of a 8GB chunk (all of which is loaded in GPU RAM only), and only reorganize the rays when they leave a chunk.

So even Pixar-esque scenes can be rendered quickly in 8Gb chunks. In theory of course, I read a paper on it but I'm not sure if this technique is commercial yet.

But basically, raytrace until a ray leaves your chunk. If it does, collate it for another pass to the chunk it's going to. On the scale of millions of rays (like in Pixar movies), enough are grouped up that it improves rendering while effectively minimizing GPU VRAM usage.

Between caching common objects and this octtree / blocks technique, I think Raytracing can move to pure GPU. Whenever Pixar feels like spending a $Billion on the programmers of course.

janwas · on Nov 4, 2024

Or perf/TCO, or availability? :)