In my experience, not quite as fast for fully-tuned code, but the difference is small - and given the same project deadline, the PyTorch version will probably be faster.
Properly tuned (DistributedDataParallel + mixed precision) it will train faster, and consume a lot less RAM, allowing you to use larger batches, and higher LR.