I’ve found it to be pretty terrible compared to CUDA, especially with Huggingface transformers. There’s no technical reason why it has to be terrible there though. Apple should fix that.
MLX will probably be even faster than that, if the model is already ported. Faster startup time too. That’s my main pet peeve though: there’s no technical reason why PyTorch couldn’t be just as good. It’s just underfunding and neglect