The only thing that stops me from being able to use notebooks full time is their intellisense compared to IDEs is horrible. I like being able to use them for demos/presentations, but I can't imagine trying to code within one primarily. Especially when it comes to tracking results.
How do people cope with this? Do you supplement it with other tools? I spend a lot of my time in an IDE and then just paste some of the code in to cells. That seems easier.
I do the opposite, my job is kind of bad data engineer/scientist/etl minion so it's a lot of dataframes.
Work (and often debug) in jupyter -> open the notebook from pycharm when it's got some completed thoughts and write into a python module + test module, tidying up and adding type annotations.
Sometimes doing that multiple times so that the notebook is importing from modules which were originally pulled out of the notebook.
It sucks having to use two tools but I don't think there's any one tool that can do both as well as pycharm/jupyter, short of me getting a lot better at emacs or writing a lot of custom Atom extensions (I think).
I am very hopeful that JupyterLab will get support for the Language Server Protocol sometime soon. That would make all the difference in the world for me. I'd still have to use a terminal to build and run tests, but I wouldn't be surprised if a test runner comes along fairly quickly after that.
Data frame rendering in the various notebooks (beaker,jupyter,zeppelin,..) is wonderful.
Your workflow sounds closest to what I do. If I want to visualize something I tend to compile my thoughts/imports and organize things in an editor first and put it in a notebook in parallel. It helps with version control as well.
I am a Spark data engineer and spend a lot of time in Scala / Python IDEs & browser notebooks. Databricks lets you package code as JAR / wheel files & attach the binaries to the cluster. I write all the complicated code in tested projects that are checked into GitHub & use the notebooks to invoke the functions and visualize results.
Folks that try to do all programming in notebooks typically drown in complexity and suffer.
Yeah I agree. We do something similar if we're using zeppelin or beaker. I organize it, put an uber jar in there and then run everything from there. That's a ton easier.
When you're processing a lot of data, it can be expensive to keep re-running your whole script every time you make a change. The notebook keeps the results of your earlier steps in memory when you want to change and re-run a later step.
This is a trade-off between how much code you're writing and how much data you're processing. If you're writing maybe 20 lines of code but you have enough input that it takes several minutes to run, the notebook becomes a clear win for your development process.
So does the standard terminal repl in python. You can achieve the same workflow with having a plain old python file, and then just use your favorite editor's "Send block of code to console" function. This way, you retain your editor's functionality while you can work just as interactively as with a notebook.
You can generally persist the results your self to disk though. Especially since a lot of things end up being numpy arrays. So you run 1 script that saves all the results, and another that loads it and runs just the part of your workflow you want. Bonus: it's persisted to disk on top of that! I know things get more complicated than that, but I'd say the compelling use case for notebooks isn't the state saving but more the whole package in one place (state persistence,visualization, interactive repl,..)
Is your plan with this to always have it as what seems like a hosted service?
Is it possible to use it as what seems like a drop-in replacement for jupyter notebooks?
We have more data then I think would make sense to transfer out of our clusters/datacenter and privacy issues would probably be raised but I would love to use something like this.
I find this odd because I am the opposite - one of my primary use cases for Jupyter/ipython in general is the ease with which I can get 'live' code introspection and intellisense. It's often my prototyping sandbox for python code that I then move into my IDE once it's close to being ready.
I also notice that developing in this way encourages me to create smaller, more testable functions that i can easily work with inside a single notebook cell.
It's not about writing code as much as it is about exploring the data.
If you're writing a lot of code in them, it's probably better to put that code into libraries that get imported and reused.
And I do agree that default code environment is unbearable. Particularly the auto insertion of completing quotation marks, which has me continually fighting with the editor to get correct code into a tiny web text box.
Oh I won't argue you with you there. I just find myself rotating quite a bit because I have to do both deployment as well as writing code for experimentations.
What I'm specifically talking about is even that kinda hacky experiment code you end up writing. I don't try to implement whole projects in there, but even just "train this model" type code ends up being a hassle because of how bad the editors are.
My above comment was more referencing wishing I could spend more time writing experiment code in jupyter without copying and pasting all the time.
That's surprising because I have the opposite experience! Since my first cell is to import all of the libraries I want to use to memory, the intellisense works without fail, regardless of how big the libraries are. Comparing that with my VS Code experience where using intellisense to pull up functions' doc strings takes an age for all but the inbuilt Python libraries.
Hey There! I'm trying to solve this right now in VSCode's in built editor: https://github.com/pavanagrawal123/VSNotebooks . It's a fork from another extension somebody already built, but all activity is dead, so I'm starting up dev on an active fork. I'd love to hear any feedback y'all have! :)
NBextensions and doing mostly data analysis in notebooks then building actual code in a text editor. I would do this even if notebooks had perfect intellisense support.
diving code between models/data-pipelines and experiments. Notebook are used for visualization and telling a story why you tried what for the other team-members.
Yeah but the whole point is "interactive coding". It doesn't feel very interactive when I have to context switch all the time :). I'd prefer something closer to what the lisp folks get to do with the repl where you can scratch out an idea and see it working without leaving your environment.
well, I don't think so. Not everything you do is interactive. Data exploration and basic model selection is, but complex models and more complicated data-pipelines/preprocessing isn't, I think.
Tensorflow is the opposite of interactive, even in a notebook.
putting models (in a sense of more complicated models, not just a SVM), data-pipelines, shared visualization-code in a src folders and experimenting in the notebook divides stuff that's interactive by nature from "real" coding. I don't context-switch that much to be honest.
I don't really copy code into cells, because I only experiment there.
Also, what happens if you need to share code between notebooks?
I think notebooks should be simple and explain the experiments and the reasoning behind them to your coworkers. Otherise it's hard to coordinate and learn from each others insights into the data.
How do people cope with this? Do you supplement it with other tools? I spend a lot of my time in an IDE and then just paste some of the code in to cells. That seems easier.