Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Effective Tensorflow (github.com/vahidk)
372 points by adamnemecek on Aug 13, 2017 | hide | past | favorite | 49 comments


But _why_ use TF when you have PyTorch which is just as powerful, runs noticeably faster for most workloads, _and_ is easy to understand? What are you gaining by using TF these days?


I think at the moment, TensorFlow has a better tooling and deployment story with tools like TensorBoard[1], Tensor Processing Units in Google Cloud[2], distributed training[3], and deploying to mobile apps[4].

As time passes, PyTorch will probably get better at all of these things, but the core team and surrounding community need to make it a priority.

Meanwhile, Google's investment in TensorFlow will continue, and there's no sign that it will be a bad bet for large-scale deployments... even if PyTorch becomes the favored system for prototyping.

[1]: https://www.tensorflow.org/get_started/summaries_and_tensorb...

[2]: https://www.servethehome.com/google-cloud-tpu-details-reveal...

[3]: https://www.tensorflow.org/deploy/distributed

[4]: https://www.tensorflow.org/mobile/


TPU will be everywhere next year in the form of NVIDIA Tesla V100. And FB is clearly getting much better ROI. It's not even close.


Google's TPU and Nvidia's GPUs are competing products.

Source: works at NVidia.


Why the downvotes? V100 does have a TPU-like unit in it.


While I’ve observed PyTorch running faster for my (convolutional-based) research, it’s within a few milliseconds of TensorFlow and Keras. If this type of difference mattered (it might for some uses), I would imagine you’d use CuDNN directly. I guess that’s the point, the libraries are all wrapping the same library. It’s like measuring the IO performance between programming language standard libraries (they should all be close to the speed of the underlying system call).


We just plain can't do data augmentation quickly enough with TF. Queues-schmeyes, doesn't matter. Still tops out at about 35MB/sec on MS COCO and starves even a single Titan Xp. On the same hardware, with the same data augmentation steps, PyTorch gets ~50MB/s or so and saturates the GPU, since it never has to wait for data. In fact it can even read faster than that, and automatically parallelize the forward pass across several GPUs. You do still retain full control over placement, however. Super slick.


It's nice to have the Keras front-end for TF - less boilerplate etc.

I don't think there would be a reason why Keras couldn't sit on PyTorch but I can't see any work on this.

Keras is very productive unless you are doing NN research. Also you can also run it using CNTK if you want speed up RNNs which are slower on TF compared with CNTK.


I use Keras for machine learning research and find that it works extremely well. The one advantage of PyTorch is more flexible broadcasting, but that introduces other problems.

A major advantage of Keras is that if you’re using a standard training scheme (e.g. training a convolutional neural network for image classification), your research will be entirely focused on either the underlying architecture (that’s easily summarized and serialized), your custom layers (my favorite abstraction for convolutional neural networks), and losses. Keras’ abstractions eliminate stuff like IO that I find extremely distracting.


Can you please expand on what you mean by custom layers and abstractions and stuff? Thanks!


Nice video explaining a bit on the same https://www.youtube.com/watch?v=bvZnphPgz74


If you want to run in production.

Even FB doesn't use PyTorch in production, and instead uses Caffe2.


PyTorch was certainly used in production at Facebook (and elsewhere).

It might’ve changed in the past few months as Caffe 2 matured, but there were a few internal applications that used PyTorch.


Fwiw this is what Soumith has said: "Internally at Facebook, we have a unified strategy. We say PyTorch is used for all of research and Caffe 2 is used for all of production."

https://www.oreilly.com/ideas/why-ai-and-machine-learning-re...

Its not exactly a secret that PyTorch's tradeoffs favor research and not production.


Totally. And I think it’s a solid strategy. However, there’s certainly an interest (internally and externally) in providing a better inter-operability story between the two. I imagine something like using Keras for model creation (and possibly training) and running (either on mobile or the cloud) on some Caffe 2 deployment.


Its a good strategy. But its no silver bullet either. If you're exporting to a "static graph" platform, your losing a major benefit of PyTorch. If you mostly just care about shipping to production, a case can be made to just use tf/caffe2/mxnet etc from the start.

While PyTorch is extremely cool, the fanboyism is out of hand, thinking that what's good for their corner of the universe must be awesome for every use case and therefore TF is a overcomplex turd. Its not like the people designing these systems are stupid.


I agree. PyTorch’s dynamism is fantastic. However, I have no idea how you’d manage to recompile PyTorch code to Caffe 2 in a satisfying way. If something is released, I suspect it’d be limited to a subset of PyTorch features (I’d also bet that subset doesn’t include the features that make PyTorch compelling).


Some people (like me) use Windows. PyTorch is not supported by Windows currently.


Spin up a virtual machine? It's worth getting into Linux for machine learning.


Passing GPUs to VMs is tricky, especially with nvidia consumer GPUs. It's certainly not "just spin up a VM".


I think the are 2 main reasons: TF is backed by Google and you can use for research and production (server & mobile) with minor changes. You can also train your TF models in the cloud without having to setup anything (as in Google Cloud[1] or TensorPort[2])

[1] https://cloud.google.com/ml-engine/

[2] https://tensorport.com/


I really don't think the speed thing is as clearcut as you let it out to be


Tensorflow is more documented and talked about right now which is why more people will be interested in it


I'm not actually sure it's "more documented". It has more documentation, that's true, but that's not the same thing as better documentation, and after some point the mountain of documentation becomes a daunting obstacle. TF IMO is far beyond that point.


TF documentation is excellent. PyTorch, on the other hand, felt like almost entirely undocumented. A month ago I tried to implement something fairly trivial (Karpathy's character RNN model), and there was no documentation at all on how to run it on multiple GPUs. I had to go through unrelated models code to understand how this could be done (using external library DataParallel). This might have improved since.

Also compare the number of TF and PT questions on SO.


I find the documentation daunting but I had assumed that was because I didn't really understand how AI works and that as my understanding grew the docs would start to make more sense.


Take a look at PyTorch. Much easier to understand, fewer bugs, much less obscure all around. I've spent years working with TF and have come to dislike its inflexible, verbose, and convoluted way of doing things. I'm 80% on PyTorch now.


> I've spent years working with TF

Ha, TF was only released in Nov 2015 but sitting around waiting for networks to finish training makes it feel longer than that!


I worked with it _before_ it was released, and with DistBelief before that. For a long time I considered TF to be the best option for the class of problems I work on. I was also extremely skeptical of PyTorch, but it won me over. Now that I'm no longer on a steady diet of Google Kool-Aid, I have the luxury of picking the best tool for the job. For a number of things that tool is made by Google (protocol buffers, gtest, gmock, grpc, some other things). But not for ML.


Based on your experience I'll definitely give PyTorch a go. Thanks.


I like and use both PyTorch and TensorFlow. You’re entirely correct that TensorFlow is far more verbose than PyTorch, but Keras remedies this (for many problems). I’d also add that PyTorch has a narrower scope than PyTorch and because of this introduces problems when scaling or distributing (PyTorch’s flexible broadcasting isn’t without trade-offs). Maybe there’ll be a convergence of Caffe 2 and PyTorch to remedy that problem.


For me, Keras is the least verbose of Tensorflow and PyTorch, however I find it too sparse and there's too much "magic" happening under the nicely done API. I have found PyTorch to be the happy half-way point between Tensorflow and Keras in terms of verbosity.


I’m curious, what parts do you find too magical? I’ve been trying to help with this problem (clarifying or surfacing features that are too magical without relinquishing Keras’ commitment to “convention over configuration”), so your (or others) feedback is welcome!


Tensorflow was the first modern tool supported by a large ai company, so probably mostly historical reasons. Also, Google is likely still the company with the largest AI departments, and the company that gives them the most visibility.


Is PyTorch actually faster than TensorFlow with the C++ static compilation?


I've tried XLA and observed no measurable benefit for most models, small benefit for some, and worse performance for others. That was a few months ago, so maybe things have changed since then. It won't be a huge difference no matter what they do, because the vast majority of time is spent in CUDNN and gemm/gemv anyway


What I'm interested in is AMD support, actually, w.r.t. GPU performance.


There's currently an open PR on GitHub for AMD GPU support https://github.com/pytorch/pytorch/pull/2365


"The most striking difference between Tensorflow and other numerical computation libraries such as numpy is that operations in Tensorflow are symbolic. This is a powerful concept that allows Tensorflow to do all sort of things (e.g. automatic differentiation)"

Sorry if this is a stupid question, but can someone explain how symbolic operations allow automatic differentiation or link me to a good explanation?


You can easily differentiate a set of (computer defined) operations formed of composition of standard operations (sums, products, etc), as in, you can define another set of computer operations that is the exact, precise derivative of the first. Wikipedia is a bit dense, but it's explained here: https://en.wikipedia.org/wiki/Automatic_differentiation


"Symbolic operations" is a fancy way of saying you build up the full mathematical equations with operator overloading instead of computing the results at each individual steps. Having the full equations then allows to differentiate with regards to a variable of your choice included in that equation


Something like lazily calculating/evaluating an operation, right?


Yep (from a compilation perspective)


That makes sense. Thank you!


This is very good documentation. In my technical writing activities I have learned that there are two kinds of bad documentation: Math textbooks ("Prove X. Lemma: ..." [Why are we proving this anyway?]) and cooking recipes ("Do a, b, c, the end" [What if I need to cook something else?]) - too little practice or too little theoretical foundation. Sometimes both are present but too disjoint, or the theory is badly structured. This one does a great job at introducing the foundations, in the right order, and immediately showing what they mean to you in practice.


For those who use Tensorflow regularly:

What 'type' of tensorflow do you recommend for a project starting today -- tf.slim, tf.contrib.keras, 'raw' tensorflow, keras with tensorflow?

I am building a production model so was probably going to use tensorflow because I like the tooling (tensorboard), and ability to write once for production and research.


is anyone using tensorflow or caffe2 on the mobile ? We are trying to build something on the android.. but it seems there are no real-life deployments using caffe2 or tensorflow on the mobile.


There are clear instructions [1] for mobile (iOS and Android) in the official TF Repository.

[1] https://github.com/tensorflow/tensorflow/tree/master/tensorf...


I am curious why someone wouldn't use IBM Watson for this? It seems it's better suited for production and serious use cases than PyTorch or TF.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: