its forced upon many of them that are in finance, banking, insurance, ...
Mainly because those tend to run on Microsoft Azure, which has no decent analytics offering, and are pushing Databricks extremely hard. The CTO or whatever just pushes databricks. On paper it checks all the boxes. Mlops, notebooks, experiment management. It just does all of those things very badly, but the exec doesn't care. They only care about the microsoft credits.
Just to avoid using Jupyter so the compliance teams stay happy as well because Microsoft sales people scared them away from from open source.
We pushed back on it very, very, very hard, and finally convinced "IT" to not turn off our big Linux server running JupyterHub. We actually ended up using Databricks (PySpark, Delta Lake, hosted MLFlow) quite a bit for various purposes, and were happy to have it available.
But the thought of forcing us into it as our only computing platform was a spine-chilling nightmare. Something that only a person who has no idea what data analysts and data scientists actually do all day would decide to do.
What would you go with instead for collaborative notebooks?
I ask because normally I tend pretty strongly towards the "NO just let the DSes/analysts work how they want to", which in this case would be running Jupyter locally. However DBr's notebooks seem genuinely useful.
Is your issue "but I don't need Spark" or "i wanna code in a python project, not a notebook?", or something else?
Imo if DBr cut their wedding to Spark and provided a Python-only nb environment they'd have a killer offering on their hands.
> What would you go with instead for collaborative notebooks?
Production workloads should be code. In source control. Like everybody else.
Notebooks inevitably degrade into confusing, messy blocks of “maybe applicable, maybe not” text, old results and plots embedded in the file because nobody stripped them before committing and comments like “don’t run cells below here”.
They’re acceptable only as a prototyping and exploration tool. Unfortunately, whole “generation” of data scientists and engineers have been trained to basically only use notebooks.
Mainly because those tend to run on Microsoft Azure, which has no decent analytics offering, and are pushing Databricks extremely hard. The CTO or whatever just pushes databricks. On paper it checks all the boxes. Mlops, notebooks, experiment management. It just does all of those things very badly, but the exec doesn't care. They only care about the microsoft credits. Just to avoid using Jupyter so the compliance teams stay happy as well because Microsoft sales people scared them away from from open source.