The only thing that stops me from being able to use notebooks full time is their...

zimablue · on Oct 30, 2018

I do the opposite, my job is kind of bad data engineer/scientist/etl minion so it's a lot of dataframes.

Work (and often debug) in jupyter -> open the notebook from pycharm when it's got some completed thoughts and write into a python module + test module, tidying up and adding type annotations.

Sometimes doing that multiple times so that the notebook is importing from modules which were originally pulled out of the notebook.

It sucks having to use two tools but I don't think there's any one tool that can do both as well as pycharm/jupyter, short of me getting a lot better at emacs or writing a lot of custom Atom extensions (I think).

bunderbunder · on Oct 30, 2018

I am very hopeful that JupyterLab will get support for the Language Server Protocol sometime soon. That would make all the difference in the world for me. I'd still have to use a terminal to build and run tests, but I wouldn't be surprised if a test runner comes along fairly quickly after that.

(Relevant issue: https://github.com/jupyterlab/jupyterlab/issues/2163)

agibsonccc · on Oct 30, 2018

Data frame rendering in the various notebooks (beaker,jupyter,zeppelin,..) is wonderful. Your workflow sounds closest to what I do. If I want to visualize something I tend to compile my thoughts/imports and organize things in an editor first and put it in a notebook in parallel. It helps with version control as well.

MrPowers · on Oct 30, 2018

I am a Spark data engineer and spend a lot of time in Scala / Python IDEs & browser notebooks. Databricks lets you package code as JAR / wheel files & attach the binaries to the cluster. I write all the complicated code in tested projects that are checked into GitHub & use the notebooks to invoke the functions and visualize results.

Folks that try to do all programming in notebooks typically drown in complexity and suffer.

agibsonccc · on Oct 30, 2018

Yeah I agree. We do something similar if we're using zeppelin or beaker. I organize it, put an uber jar in there and then run everything from there. That's a ton easier.

daveFNbuck · on Oct 30, 2018

When you're processing a lot of data, it can be expensive to keep re-running your whole script every time you make a change. The notebook keeps the results of your earlier steps in memory when you want to change and re-run a later step.

This is a trade-off between how much code you're writing and how much data you're processing. If you're writing maybe 20 lines of code but you have enough input that it takes several minutes to run, the notebook becomes a clear win for your development process.

short_sells_poo · on Oct 30, 2018

So does the standard terminal repl in python. You can achieve the same workflow with having a plain old python file, and then just use your favorite editor's "Send block of code to console" function. This way, you retain your editor's functionality while you can work just as interactively as with a notebook.

mimischi · on Oct 30, 2018

But then there are plotting and interactive widgets in the notebook.

agibsonccc · on Oct 30, 2018

You can generally persist the results your self to disk though. Especially since a lot of things end up being numpy arrays. So you run 1 script that saves all the results, and another that loads it and runs just the part of your workflow you want. Bonus: it's persisted to disk on top of that! I know things get more complicated than that, but I'd say the compelling use case for notebooks isn't the state saving but more the whole package in one place (state persistence,visualization, interactive repl,..)

daveFNbuck · on Oct 30, 2018

Yeah, I often do that myself, but it's not as convenient for a quick one-off data exploration.

solomatov · on Oct 30, 2018

If you miss intellisense, you can try datalore (https://datalore.io/).

P.S. Disclaimer: I lead this project at JetBrains, Inc.

NegatioN · on Oct 30, 2018

Is your plan with this to always have it as what seems like a hosted service?

Is it possible to use it as what seems like a drop-in replacement for jupyter notebooks?

We have more data then I think would make sense to transfer out of our clusters/datacenter and privacy issues would probably be raised but I would love to use something like this.

solomatov · on Oct 30, 2018

>Is your plan with this to always have it as what seems like a hosted service?

We are seriously considering on premises version.

>Is it possible to use it as what seems like a drop-in replacement for jupyter notebooks?

Jupyter import/export will be released soon.

agibsonccc · on Oct 30, 2018

Already a customer, you have nothing you can sell me :).

psychometry · on Oct 30, 2018

Even IDEs like RStudio pale in comparison to proper text editors when it comes to actually editing the code.

sueders101 · on Oct 30, 2018

I've become a big fan of Hydrogen recently. It's Jupyter notebooks for Atom.

https://nteract.io/atom

atomic77 · on Oct 30, 2018

I find this odd because I am the opposite - one of my primary use cases for Jupyter/ipython in general is the ease with which I can get 'live' code introspection and intellisense. It's often my prototyping sandbox for python code that I then move into my IDE once it's close to being ready.

I also notice that developing in this way encourages me to create smaller, more testable functions that i can easily work with inside a single notebook cell.

heavenlyblue · on Oct 30, 2018

Doesn't PyCharm provide IntelliSense?

epistasis · on Oct 30, 2018

It's not about writing code as much as it is about exploring the data.

If you're writing a lot of code in them, it's probably better to put that code into libraries that get imported and reused.

And I do agree that default code environment is unbearable. Particularly the auto insertion of completing quotation marks, which has me continually fighting with the editor to get correct code into a tiny web text box.

agibsonccc · on Oct 30, 2018

Oh I won't argue you with you there. I just find myself rotating quite a bit because I have to do both deployment as well as writing code for experimentations.

What I'm specifically talking about is even that kinda hacky experiment code you end up writing. I don't try to implement whole projects in there, but even just "train this model" type code ends up being a hassle because of how bad the editors are.

My above comment was more referencing wishing I could spend more time writing experiment code in jupyter without copying and pasting all the time.

renjimen · on Oct 30, 2018

That's surprising because I have the opposite experience! Since my first cell is to import all of the libraries I want to use to memory, the intellisense works without fail, regardless of how big the libraries are. Comparing that with my VS Code experience where using intellisense to pull up functions' doc strings takes an age for all but the inbuilt Python libraries.

thealfreds · on Oct 30, 2018

I'm not a Python dev. Is it not common to just type and let it auto import in the required libraries for you?

agibsonccc · on Oct 30, 2018

Java's tooling for this is among the top. We're spoiled compared to the dynamically typed languages :)

Insanity · on Oct 30, 2018

I don't tend to do so in Python whereas I do in Java.

Maybe due to often importing and naming (something you don't do in Java.)

E.g

    Import matplotlib as plot

Vs Import java.util.

renjimen · on Oct 30, 2018

You'd think so. Maybe my setup is faulty. Something for me to look in to

pavanagrawal123 · on Oct 30, 2018

Hey There! I'm trying to solve this right now in VSCode's in built editor: https://github.com/pavanagrawal123/VSNotebooks . It's a fork from another extension somebody already built, but all activity is dead, so I'm starting up dev on an active fork. I'd love to hear any feedback y'all have! :)

cirgue · on Oct 30, 2018

NBextensions and doing mostly data analysis in notebooks then building actual code in a text editor. I would do this even if notebooks had perfect intellisense support.

LeanderK · on Oct 30, 2018

diving code between models/data-pipelines and experiments. Notebook are used for visualization and telling a story why you tried what for the other team-members.

agibsonccc · on Oct 30, 2018

Yeah but the whole point is "interactive coding". It doesn't feel very interactive when I have to context switch all the time :). I'd prefer something closer to what the lisp folks get to do with the repl where you can scratch out an idea and see it working without leaving your environment.

LeanderK · on Oct 31, 2018

well, I don't think so. Not everything you do is interactive. Data exploration and basic model selection is, but complex models and more complicated data-pipelines/preprocessing isn't, I think. Tensorflow is the opposite of interactive, even in a notebook.

putting models (in a sense of more complicated models, not just a SVM), data-pipelines, shared visualization-code in a src folders and experimenting in the notebook divides stuff that's interactive by nature from "real" coding. I don't context-switch that much to be honest.

I don't really copy code into cells, because I only experiment there.

Also, what happens if you need to share code between notebooks?

I think notebooks should be simple and explain the experiments and the reasoning behind them to your coworkers. Otherise it's hard to coordinate and learn from each others insights into the data.