The CUDA moat is largely irrelevant for inference. The code needed for inference is small enough that there are e.g. bare-metal CPU only implementations. That isn't what's limiting people from moving fully off Nvidia for inference. And you'll note almost "everyone" in this game are in the process of developing their own chips.