The CUDA moat is largely irrelevant for inference. The code needed for inference...

		vidarh 12 days ago \| parent \| context \| favorite \| on: An analysis of DeepSeek's R1-Zero and R1 The CUDA moat is largely irrelevant for inference. The code needed for inference is small enough that there are e.g. bare-metal CPU only implementations. That isn't what's limiting people from moving fully off Nvidia for inference. And you'll note almost "everyone" in this game are in the process of developing their own chips.