its forced upon many of them that are in finance, banking, insurance, ... Mainly...

nerdponx · on Nov 16, 2022

My team very nearly had this happen to us.

We pushed back on it very, very, very hard, and finally convinced "IT" to not turn off our big Linux server running JupyterHub. We actually ended up using Databricks (PySpark, Delta Lake, hosted MLFlow) quite a bit for various purposes, and were happy to have it available.

But the thought of forcing us into it as our only computing platform was a spine-chilling nightmare. Something that only a person who has no idea what data analysts and data scientists actually do all day would decide to do.

akdor1154 · on Nov 16, 2022

What would you go with instead for collaborative notebooks?

I ask because normally I tend pretty strongly towards the "NO just let the DSes/analysts work how they want to", which in this case would be running Jupyter locally. However DBr's notebooks seem genuinely useful.

Is your issue "but I don't need Spark" or "i wanna code in a python project, not a notebook?", or something else?

Imo if DBr cut their wedding to Spark and provided a Python-only nb environment they'd have a killer offering on their hands.

FridgeSeal · on Nov 16, 2022

> What would you go with instead for collaborative notebooks?

Production workloads should be code. In source control. Like everybody else.

Notebooks inevitably degrade into confusing, messy blocks of “maybe applicable, maybe not” text, old results and plots embedded in the file because nobody stripped them before committing and comments like “don’t run cells below here”.

They’re acceptable only as a prototyping and exploration tool. Unfortunately, whole “generation” of data scientists and engineers have been trained to basically only use notebooks.