Two metrics stood out in my experiments with CNNs back in May 2017:
1) Compilation speed for a jumbo CNN architecture: Tensorflow took 13+ minutes to start training every time network architecture was modified, while PyTorch started training in just over 1 minute. PyT suits my style of interactive coding far better.
2) Memory footprint: I was able to fit 30% larger batch size for PyTorch over Tensorflow on Titan X cards. Exact same CNN architecture.
Both frameworks had major releases since May, so I am sure these metrics might have changed by now. However I ended up adopting PyT for my project.
Google is actually very scared of PyTorch since nearly all new AI research papers are using PyTorch. This means in 1-2 years or so, most companies will be using PyTorch for training and Caffe for deployment. Tensorflow doesn't even have an implementation of the latest ImageNet winner, DenseNet, but PyTorch does!
I have sources on the Tensorflow team saying they are scrambling to make another higher level wrapper for tensorflow at the level of PyTorch.
Jumbo CNNs are not the battleground.
The real battleground is distribution. The first framework that scales out without placing much onus on the programmer will win, IMO. Facebook already showed that Caffe2 scales to 256 GPUs for imagenet. Tensorflow need to show it can scale as well. PyTorch needs to work on usability - model serving, integration in ecosystems like Hadoop, etc.
1) Compilation speed for a jumbo CNN architecture: Tensorflow took 13+ minutes to start training every time network architecture was modified, while PyTorch started training in just over 1 minute. PyT suits my style of interactive coding far better.
2) Memory footprint: I was able to fit 30% larger batch size for PyTorch over Tensorflow on Titan X cards. Exact same CNN architecture.
Both frameworks had major releases since May, so I am sure these metrics might have changed by now. However I ended up adopting PyT for my project.