I just wanted to say thank you. Many of the points in your study strikes a nerve. Part of my responsibility at my last job was to introduce good software engineering practices. What happens? The data scientists go rogue and start running notebooks left and right. How do they productionize their work? Well, they don't. They were academics. All they know is that the models ran fine in their notebooks on their laptops. Meanwhile, we didn't have anyone that was devoted full time on model productionization.
Sharing data? They had enough problems sharing their notebooks.
I just happened to be reading Peter Naur's "Programming as theory building" recently. It strikes me that taking its theme even a little seriously helps understand why notebooks are so popular. Notebooks happen to be convenient tools for exploring a new domain (interactively). Irrespective of how much software purists might complain, conventional software engineering provides very few tools/solutions/practices for that process. The wretched state of interactive debugging (in most languages) is a simple example.
As someone who spends a substantial amount of time working with both modes (writing research code in Jupyter notebooks, and writing production code as python modules), notebooks scratch certain itches that IDEs typically don't even come close to. (Some recent progress on add-ons in Javascript-based editors is potentially interesting, because that might help marry the strengths of the two)
In my experience, in the evolution of code from Jupyter notebooks to repositories of production code as part of any project, there comes a "right time" to switch from the former to the latter. And this can typically only be learned with experience.
I just refactor into a module that I import into my notebook as I go along. This lets me use the notebook for quick prototyping, but also productionize faster if need be.
That only works after the code in the module is largely "frozen". It doesn't work well if you're experimenting with ideas inside the module. OTOH, if the algorithm is largely frozen, and you're trying to experiment with its performance on a bunch of examples, the workflow of putting the algorithm in a module and using a notebook to interface with data and visualize results is quite useful.
That is basically what I meant by knowing when to transition from one mode to the other.
Here's a concrete example (maybe somebody considers this an inspiring challenge?), to illustrate how notebooks are infuriating in their primitiveness, but still better compared to using an editor on source files: Imagine a beginner trying to write/learn a sorting algorithm, and who would like to keep experimenting with their code and observing what happens on examples, possibly profiling space/time complexity along the way.
To expand on my point above, there are actually three distinct computational use cases, not just two: Interactive learning -> Sharing insights with others -> Productionizing code.
I guess the objection is that if you what you are experimenting is inside a module, you've moved the "active" code out of the notebook, and then given up the interactiveness.
>to introduce good software engineering practices. What happens? The data scientists go rogue and start running notebooks left and right. How do they productionize their work? Well, they don't. They were academics.
My background is programming (instead of data analysis & modeling) so I'm sympathetic to your idealistic "software engineering" view... but I'm also sympathetic to the academics' side as explained by Yihui Xie's blog post:
He's convinced me that criticizing non-programmers for using (or over-using) computational notebooks when it should be a "proper" programming language and deployment is like criticizing financial analysts over-using Excel to learn how to program VB or Python and re-write their spreadsheets into a "proper database" like Oracle or MySQL. That's just not reality. This divide between "end user tools" and "proper programmer tools" will always exist because there is no perfect tool in existence that serves the needs of both skill sets. Therefore, the programmers will always be able to say the data scientists or financial analysts are "doing it wrong".
> He's convinced me that criticizing non-programmers for using (or over-using) computational notebooks when it should be a "proper" programming language and deployment is like criticizing financial analysts over-using Excel to learn how to program VB or Python and re-write their spreadsheets into a "proper database" like Oracle or MySQL.
I think this is very much off the mark. For sure plenty of scientists are poor programmers, but that isn't the reason they use notebooks. It is because:
They are not attempting to write something that will run everywhere, and often. They are either analyzing some data or doing rapid prototyping. For the latter, it's like criticizing someone who uses a REPL. It's just that the Notebook is much more powerful than a simple REPL that one can safely stick to it. Imagine you will do 40-50 prototypes and only one of those may end up worthy enough to make a product out of, and you don't know which one that will be. If you used a non-notebook environment, you'd give up in frustration by the time you hit the 15th one.
As you said: At the moment, there simply isn't an alternative that allows for rapid prototyping and is production ready. It's a hard problem to solve - there's a reason no one had solved it for decades (well before notebooks were a thing).
Had notebooks not been invented, you would have the same people handing you MATLAB code asking you to productize it.
Claiming they are beginners/novice programmers is off the mark. Peter Norvig started using notebooks for a reason, and no one would call him a novice. I do SW for a living, but when I need to analyze data and visualize it, I'll pick a notebook over "proper" SW tools any day.
We shouldn't assume it will always exist. It exists because programming languages and tools are not as usable as they can be. That is something we can and should expect to change.
Notebooks are like training wheels. They serve multiple purposes, one of the most important being signaling ineptitude to others. Code smells are useful and a notebook does too.
That’s a really good comparison. Excel is often used for storing data and doing analysis because it just plain works. And anyone can use it.
Notebooks tend to be the same way. It’s a simple GUI-ish was to do many complex analyses in a quick and dirty way.
And many of the arguments for not using Excel are the same as not using notebooks. Each is good at the initial data exploration stage, but are often abused and used in production when everyone knows it is a bad idea. But it still “works” so it is unlikely to be replaced.
(Especially when those that are working with the data don’t always have the skill set to build out a full production workflow.)
I'm a computational biologist and Excel has been the bane of my existence for 20 years. We've "known better" for all of that time, but I still deal with people passing around Excel files of data or having common spreadsheets on shared drives (or now Dropbox shared). We all "know better", but Excel is often the first thing that people try to keep track of data, and once a system works, there is just too much inertia to change.
(For what it's worth, I feel the same way about people who try to send me RDS files with dataframes stored as R objects).
However, I think that whoever decided to name genes "OCT4" and "SEPT7" have to share some of the blame here too...
My last job I spent 80+% of my time productionizing models and notebooks. It was an absolute nightmare. Everyone had slightly different preprocessing hacks for different stages and things were always working fine locally, but I couldn't replicate the results in docker containers.
Have you looked into the domain of "research data management"? Concerns such as "archival", "security" or "share & collaborate" are core to this research domain:
In academics, there's a trend to prepare a "data management plan" up front that creates awareness about these concerns. They are even a requirement in order to get funding:
So, it's a bit odd to see a study that's focussed on a single technical tool yield the same concerns... but not making that jump to a larger, existing framework on information management.
Looking at the authors, it seems you are located at Oregon State University. A quick DuckDuckGo search yields this service from your colleagues at the University Library:
With the context of notebooks themselves, I think the study reflects on "if you have hammer, every problem looks like a nail." Notebooks aren't the only powerful tool to work with data. I think many of the same concerns could be raised with Google Sheets or Excel with heavy VBA scripting. Like others said, this is not a new problem.
Notebooks do have a place in the bigger process of doing iterative research based on data mining techniques. They can help to formulate more accurate questions and perform quick tests without the friction of having to set up complex environments. Moving on from initial data exploration, it's up to the researcher to use a formal method and tools that do mitigate those concerns. RDM is all about providing tools and mitigating (legal) liabilities as far as "what do you do with your data?" is concerned.
In my experience, the best approach is to treat the notebook as the frontend. So widgets, graphs, annotations are generally ok. Anything compute intensive should be relegated to the backend.
I think adding feedback for marking cells as dependant on each other might be a good idea.
I'd also love code completion in notebooks.
I think the cleaning and code reuse problems can easily be mitigated by putting functions into libraries and using auto reload.
My normal workflow is hack something in a notebook until it runs, then refactor and put in a library I import with auto reload. I work on production ML and I use this for both software development and research.
> Co-author of the study here. Let me know if you have any questions or how you overcome some of the problems we identified!
It's not clear who the audience is. It sounds like most people who complain about them are software people and not researchers/scientists.
For someone like me, who once did computational research using MATLAB, and later analyzed data for my job, Jupyter is not worse, and is in most ways superior. Let's take your points one by one:
> Participants stated they often downloaded data outside of the notebook from various data sources since interfacing with them programmatically was too much hassle.
This was the norm with MATLAB, Excel and JMP as well, unless someone wrote code to autodownload (extremely rare - less than 1% of people did that). And if you are going to write code to get the data from somewhere, it's much nicer in Jupyter than in these other tools.
> Not only that, but notebooks often crash with large data sets (possibly due to the notebooks running in a web browser).
I honestly have not seen this, and the reason makes no sense. Your browser is not handling the data. The kernel is. I mean yes, if you try to load several GB of data in pandas, it's possible you will have problems if you run out of RAM, but this has nothing to do with notebooks.
> Once the data is loaded, it then has to be cleaned, which participants complained is a repetitive and time consuming task
This was as much a problem prior to notebooks as it is now. Notebooks did not make this any worse.
> Explore and analyze. Modeling and visualizing data are common tasks but can become frustrating. For example, we observed one participant tweak the parameters of a plot more than 20 times in less than 5 minutes.
It was even worse with MATLAB. Ditto for Excel. JMP is a bit nicer for visualization, though.
> Notebooks do not have all of the features of an IDE, like integrated documentation or sophisticated autocomplete, so participants often switch back and forth between an IDE (e.g., VS Code) and their notebook.
It may be better now, but this was a problem in MATLAB as well.
> While it is easy to share the notebook file, it is often not easy to share the data.
This is as true with MATLAB, JMP, etc. A lot of the complaints about it being hard to reuse notebooks is because notebooks at least attempt to be reproducible, and thus many more people attempt it. Prior to notebooks, I know almost no one who tried to share MATLAB analyses, because it was such a pain to do so.
> Notebooks as products. If a large data set is used, as one might expect in production, then the notebook will lose the interactivity while it is executing. Also, notebooks encourage "quick and dirty" code that may require rewriting before it is production quality.
I suppose some people are trying to make products out of notebooks, and this is where all the recent grief I see is coming from. I do not think it was the primary goal of notebooks, though. They were meant for data analyses and prototyping, not for production use.
Much of your comment could be summarised as “it’s no worse than prior tools”. That doesn’t invalidate the authors points though: just because it’s better than previous tools, which have the same or worse problems, doesn’t mean notebooks don’t have problems that should perhaps be tackled or talked about. That it’s an improvement over what existed before doesn’t mean you can’t be critical about the flaws it still has and a study like this (looking at real people) and discussion like we’re having here is a necessary start to finding out how to improve on this.
People who design and implement features in notebooks. The conclusions in the blog post and research paper are clear that improving these identified problems could improve user experience.
> I honestly have not seen this, and the reason makes no sense. Your browser is not handling the data. The kernel is. I mean yes, if you try to load several GB of data in pandas, it's possible you will have problems if you run out of RAM, but this has nothing to do with notebooks.
This is true nowadays with Jupyter, because it is smart about truncating output. But it used to be possible to OOM the browser by e.g. printing in an long-running loop or displaying too long of a list/table.
Maybe because of the negative sentiment/bad taste statements like "What's wrong with..." leave. It could say "What can be improved with ... in 2020" instead. Probably not intention of the author but it comes as non-constructive crtiticism/"not recommended to use" a bit too much. Some observations seem to not be directly related to notebooks per se. Other feel like could be made as just entries in a FAQ/best practices section of documentation.
Kudos for looking at real people's work and surveying it.
I wonder how much workflow could be improved if researchers would be temporarily paired with developers - who are generally better at modularising and removing friction in their work.
Personally I believe that a bit of clean-code discipline and following known best practices could solve couple of those pain points.
It's also true some could be improved by rethinking how notebooks work; ie. being able to specify input/output of notebook so it can be used as a library; detaching runtime data from the code so it plays better with version control/publishing; maybe even more radical ideas like adding visual/flow view that helps with linking elements; adding built-in excel-like sheets that can be queried/manipulated could also be interesting; built-in, first class support for relational database (sqlite) could also be a big win.
There are many interesting developments happening in this space and there seem to be some unexplored ideas waiting to be tested out.
Yes! Not only that, but with all of the end-user programming research. I did some studies on LabVIEW programmers before and I noticed a lot of the same phenomenon with data scientists. They have a lot of domain knowledge, some programming experience, but usually do not use software engineering best practices or tools (e.g., unit testing, code reviews, automated refactoring). All of this is very understandable but reveals a lot of potential for tools to better support them.
See Yestercode [1] and CodeDeviant [2], two tools that I specifically designed for LabVIEW programmers to refactor and test their code without expecting them to behave like traditional software engineers.
Interesting study! I'm curious what shadowing 15 R data scientists would look like, since it seems to resolve some of the pain points around caching results, debugging, and scaling.
This is a very minor question (and I am not concerned about risk to participants)--when you say they signed consent "in accordance with our institutional ethics board", are you talking about Microsoft, one of the two universities, or all?