I couldn't agree more. I'm fluent in languages like Julia, and MATLAB. I'm 90% fluent in R and prefer data.table over dplyr but working in both is easy enough. The past few months I've been fully transitioning to Python. And while base Python I find to be extremely elegant, typical data science and scientific computing workflows are a headache. There aren't just 1-2 packages to choose from for each use, every package has it's own syntax, keeping track of Pandas Series vs DataFrames is confusion. Want fast differentiable code? Then rewrite everything in numpy in JAX which requires its own tricks.
What Python desperately needs is a coordinated effort for a core data science /scientific computing stack with a unified framework.
In my opinion, if it weren't for Python's extensive use in Industry and package ecosystem, Julia would be the language of choice for nearly all data science and scientific computing uses.
> And while base Python I find to be extremely elegant, typical data science and scientific computing workflows are a headache.
That's my impression as well. Going back to the topic of the original post, pandas only partially implements the idioms of the tidyverse so you have to mix in a lot of different forms of syntax (with lambdas to boot) go get things done.
Julia is much nicer, but I find myself using PythonCall more often than I'd like.
Scipy was originally supposed to provide the scientific computing stack, but then many offshoots in the direction of pandas / ibis / JAX, etc. happened. I guess that's what you get with a community-based language. MATLAB has its warts but MathWorks does manage to present a coherent stack on that end.
> What Python desperately needs is a coordinated effort for a core data science /scientific computing stack with a unified framework.
In fairness, if you're not touching Pandas, it's pretty good I'd say. Everything is based around numpy and scipy. Sklearn API is a bit idiosyncratic but works really nicely in practice and is extensible. JAX has an API which is 1:1 equivalent to numpy, probably with some catches but still. All the trouble starts with pandas.
Pandas is pretty terrible IMO for all the reasons listed by OP and TFA - and more.
What Python desperately needs is a coordinated effort for a core data science /scientific computing stack with a unified framework.
In my opinion, if it weren't for Python's extensive use in Industry and package ecosystem, Julia would be the language of choice for nearly all data science and scientific computing uses.