As someone who uses tensorflow a lot, I predict an enormous clusterfuck of a tra...

XCSme · on Sept 30, 2019

Not only upgrading is hard, but also installation (on Windows at least). For each Tensorflow version you need a specific python version, a specific CUDA version, specific tensorflow-gpu version, and many other easy to get wrong things. The problem is not the requirements, but that it's very hard to know what versions are compatible. There are endless threads on Github of people trying to use Tensorflow but failing after spending days trying to install it.

mikehollinger · on Sept 30, 2019

Try using these containers[1] from my peer team at IBM. They run on a variety of architectures (x86, ppc64le, with and without GPUs in most cases).

In addition if you don't want to fiddle with the containers, there's also a conda channel[2] that lines stuff up. I work on a peer team in machine vision, and use these for personal and professional projects.

[1] https://hub.docker.com/r/ibmcom/powerai/

[2] https://www.ibm.com/support/knowledgecenter/en/SS5SF7_1.6.1/...

hadlock · on Oct 1, 2019

Tensorflow sounds like an ideal candidate for running in a container. List out all your approved compatible versions in the Dockerfile and distribute it with your source code and anyone can reproduce your results with the exact same setup.

XCSme · on Oct 1, 2019

Yes, but for personal use, unless someone else already made those containers, I will still have to go myself through the trial-and-error process of finding the right combination of versions. Yes, second installation will be easier, but if I just want it on my PC it doesn't really help.

hadlock · on Oct 2, 2019

You can `docker pull username/mytensorflowcontainer` and start from someone elses' work. Looks like Tensorflow has a how-to on the site: https://www.tensorflow.org/install/docker including working cpu-only and gpu-enabled examples.

nicklarsennz · on Sept 30, 2019

I've been using containers for this, and mounting the source code.

jacquesm · on Sept 30, 2019

I've been using Anaconda and it finally made Python work without endless rounds of 'incompatible libary bingo'.

It also works well for machine learning projects.

jwilber · on Sept 30, 2019

Anaconda may work well as a virtual environment for some ml projects, but it is by no means a solution for getting a gpu-working installation of tensorflow on Windows.

jacquesm · on Sept 30, 2019

> getting a gpu-working installation of tensorflow on Windows

I think part of your problem is the last word there. Windows is a bad match for such an environment. On Ubuntu it is pretty much painless.

jwilber · on Oct 1, 2019

I agree, and prefer Linux myself, but some clients only allow for solutions based in Windows (and no containers) :/

tudelo · on Sept 30, 2019

+1 for Anaconda, worth trying for anyone who has a problem with package versions.

echelon · on Sept 30, 2019

As a software engineer and non-data scientist, I hate Anaconda because it feels like it's a tool that tries to be the be-all, end-all package management tool for everyone in the data science field, yet it feels like a sloppily built, bloated whale. It's even managed to overwrite PATH on some of my Linux machines, which is where I drew the line.

I vastly prefer creating hermetic environments with either venv or Docker. They're much cleaner and easier to work with. I wish data scientists would adopt these tools instead.

Sadly, many of the ML models I investigate on Github don't even have their package requirements frozen. It's an uphill battle...

jacquesm · on Sept 30, 2019

> I vastly prefer creating hermetic environments with either venv or Docker. They're much cleaner and easier to work with. I wish data scientists would adopt these tools instead.

I suspect you have a lot of time on your hands. But for me the 'batteries included' approach really nails it, why repeat the headache over and over again when a single entity can take care of that in such a way that incompatibilities are almost impossible to create? The hardest time I've had was to re-create an environment that ran some python code from a while ago, with Anaconda it was super easy.

I'm sure it has its limitations and just like every other tool there are situations where it is best to avoid it but for now it suits me very well.

mstokholm · on Oct 1, 2019

I would suggest you try out Miniconda (https://docs.conda.io/en/latest/miniconda.html). It comes with just the basics, and let's you install TF with GPU support by simply doing:

conda install -c anaconda tensorflow-gpu

dekhn · on Sept 30, 2019

it's incredibly slow.

xxxtentachyon · on Sept 30, 2019

What’s incredibly slow? Installing things with conda?

dekhn · on Sept 30, 2019

conda install for our environment.yml: about 3-5 minutes solving, then 5-10 minutes installing.

pip install with almost exactly the same set of packages: 3-5 minutes total.

ghop02 · on Sept 30, 2019

They're (slowly) making this better. Starting with 1.15, there is only one tensorflow pip package, no tensorflow/tensorflow-gpu hell any longer.

n8henrie · on Sept 30, 2019

Really? https://www.tensorflow.org/install/gpu says to `pip install tensorflow-gpu`.

ghop02 · on Sept 30, 2019

https://groups.google.com/a/tensorflow.org/forum/#!topic/dev...

Ah I guess it's on the way but not fully there yet

jlebar · on Sept 30, 2019

> If you need to transition from TF1 to TF2, consider doing the TF1 to pytorch transition instead.

Or consider https://github.com/google/jax !

applecrazy · on Sept 30, 2019

Just looked at Jax. At first glance, it seems to be a GPU/TPU based NumPy?

The thing is, TF has more than tensor ops. It has pre-defined NN layers, data loading/serialization, distributed training, metrics, and model serving.

It seems like a bit of a step backwards, that's all.

Edit: "matrix ops" -> "tensor ops"

nmca · on Sept 30, 2019

Have you ever used any of those features of tensorflow though? They're all, er, idiosyncratic. If you're a decent software engineer and are following the mafs well from a book, I couldn't reccomend Jax highly enough. (I work on big tf RL projects every day).

applecrazy · on Sept 30, 2019

That’s true. I’m still a student and haven’t really shipped a large ML project. Most of my ETL is done using scrapers, manual Pandas transformations, and storage in flat files. But yeah, I see your point, and frankly I don’t think there’s too big of a user base around these specialized features. Not to mention that some can be, erm, difficult to use. A friend tried using the YouTube-8M dataset in TFRecords format for a project and was extremely annoyed at the complexity.

jlebar · on Sept 30, 2019

> At first glance, it seems to be a GPU/TPU based NumPy?

Yes, with a compiler to make this fast.

> The thing is, TF has more than tensor ops. It has pre-defined NN layers, data loading/serialization, distributed training, metrics, and model serving.

Yes, it is a simpler and smaller API.

For things like data loading, you can use the tool of your choice -- TF, pytorch, whatever. For pre-defined NN layers, there are libraries that build this as a very thin wrapper around JAX's low-level API, see e.g. lax, which is include in JAX.

shoyer · on Sept 30, 2019

> see e.g. lax, which is include in JAX.

I think jleber means stax here, for pre-defined NN layers.

jlebar · on Oct 1, 2019

Yes, mb!

317070 · on Sept 30, 2019

I know. I don't want to go as far, but if I had to choose, I would also go for Jax, and help make it feature complete. However, it is not very feature-complete yet, and thus probably not as useful for everyone yet.

mitchellgoffpc · on Sept 30, 2019

Agreed. Tensorflow kinda reminds me of OpenGL in that its dependence on global flags causes some really annoying bugs, especially when you're using third-party libraries or pre-trained models. `enable_eager_execution`, `enable_tensor_equality`, and `enable_v2_tensorshape` have all completely broken my code at one point or another.

option · on Sept 30, 2019

»If you need to transition from TF1 to TF2, consider doing the TF1 to pytorch transition instead.»

- that’s exactly what we did and we don’t regret that decision

cdelsolar · on Sept 30, 2019

Is it production ready for serving PyTorch models? How about if I wanted to use something like Go to serve those models? That's fairly straightforward with Keras (Python) trained TF models.

JacobiX · on Sept 30, 2019

In our team, We serve PyTorch models in production using libtorch. A C++ library for loading models. You can easily call the C++ code in Go if you wrap it in a C interface.

foxtr0t · on Sept 30, 2019

Last I checked there was basically zero serving story for pytorch. The trade off seems all to common, tensorflow optimizes for enormous production applications first while pytorch optimizes for developer ease first.

If you are trying to apply one of these libraries to a production system that doesn’t get a lot of throughput you probably shouldn’t be using them (try a linear model first). If you have a high throughput application you probably want tensorflow and just deal with the shittyness.

applecrazy · on Sept 30, 2019

I'm actually looking to learn PyTorch. Where's the best place to start?

rm_-rf_slash · on Sept 30, 2019

If you have any existing ML/DL experience, picking up PyTorch is a breeze. You could get a pretty solid understanding of the framework with an MNIST handwritten digit recognition model over the course of an afternoon, so don’t sweat looking for the “right” tutorial.

applecrazy · on Sept 30, 2019

Awesome, I've done some ML before. I'll just dive into the docs.

mitchellgoffpc · on Sept 30, 2019

Highly recommend it! I love pytorch so much, it's basically numpy with automatic backprop and CUDA support. It evaluates eagerly by default, which makes debugging a lot easier since you can just print your tensors, and IMO it's much simpler to jump between high-level and low-level details in pytorch than in tensorflow+keras. Just as one example, activation functions in pytorch are applied by calling a python function on your layer, instead of passing a string argument with the function name to the layer constructor, so you write

  layer = F.relu(Linear()(input))

instead of

  layer = Dense(activation_fn='relu')(input)

As a result, it's a lot more straightforward to try out custom activation functions in PyTorch.

travisporter · on Sept 30, 2019

Pytorch website has a “blitz” tutorial that was fantastic.

vl · on Oct 1, 2019

The problem is that tensorflow is an umbrella name for bunch of related technologies: it's a matrix calculation engine, graph definition language, distributed graph calculation engine, ML algorithmic libraries, ML training libraries. On top of that it's extremely poorly documented. At the end of the day when you use it anything beyond most trivial stuff turns out to be incompatible with each other (this operation is not implemented for TPUs or GPUs, this API doesn't work with this API) and most of development cut-n-paste trial and error. Then you go to read it at source, but creative Python renaming and importing leads you to multi-hour wild goose chase.

If you switch to PyTorch, what are you going to use for prod deployment? Is there any way to use TPUs?

smhx · on Oct 1, 2019

> If you switch to PyTorch, what are you going to use for prod deployment? Is there any way to use TPUs?

PyTorch has an optional XLA device, that let's you use TPUs: https://github.com/pytorch/xla

akhilcacharya · on Sept 30, 2019

I'm still not sure what the difference between tf.keras and the other keras repo is!

And that's before Sonnet, the Estimator API (?) and TFLearn (I'm probably forgetting a bunch).

ghop02 · on Sept 30, 2019

> This is also the last major release of multi-backend Keras. Going forward, we recommend that users consider switching their Keras code to tf.keras in TensorFlow 2.0

https://github.com/keras-team/keras/releases/tag/2.3.0

minimaxir · on Sept 30, 2019

The intent of having separate repos for tf.keras and keras was to support third-party platforms in the latter.

But since the supported third-party platforms (CNTK and Theano) have stopped development due to TensorFlow, well...

tastyminerals · on Oct 1, 2019

Cannot but agree. Above that 1.14 documentation was simply deleted from the tensorflow website and now we are scratching our heads at what to do when it comes to model maintenance. We serve our models via TF Java API since our system is Java/Scala based. We can't even update existing TF Java API because it is incompatible with anything prior to 1.15. It's an utter mess.

rch · on Sept 30, 2019

Is the friction a result of trying to support eager execution?

StreamBright · on Sept 30, 2019

Sorry maybe a stupid question but would not be Mxnet better for what you do in TF?

alfalfasprout · on Oct 1, 2019

Agreed, MxNet is faster and considerably easier to use than TF.

solidasparagus · on Oct 1, 2019

There's absolutely no evidence that MXNet is faster than TF. At the high end, all three (TF/PyTorch/MXNet) are similarly performant. The reality is that implementation matters more than framework when you are talking about performance.

StreamBright · on Oct 1, 2019

My question was more about usability or interchangeability. Could TF get replaced by Mxnet in a typical deep learning project?

solidasparagus · on Oct 1, 2019

In terms of features and functionality, they are very much interchangeable. A new project could be written in any of the three major frameworks and be equally good. The only standout feature I'm aware of is that TF has the best support for doing inference on devices, but that won't be true forever. In terms of actually migrating a codebase from one to the other, the APIs are different enough in small ways that it would be a large amount of effort.