Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've worked on a few SQL systems used for analytics and ETL.

My users fell into (for the purposes of this discussion) three categories:

1. Analysts who prefer sheets

2. Data scientists that prefer pandas

3. Engineers who prefer C++/Java/JavaScript/Python

I'm fairly sure SQL isn't the first choice for any of them, but in all three cases a modern vectorized SQL engine will be the fastest option for expressing and executing many analysis and ETL tasks, especially when the datasets don't fit on a single machine. It's also easier to provide a shared pool of compute to run SQL than arbitrary code, especially with low latency.

Even as a query engine developer, I would prefer using a SQL engine. Performing even the basic optimizations a modern engine would perform -- columnar execution, predicate pushdown, pre-aggregation for shuffles, etc -- would be at least a week of work for me. A bit less if I built up a large library to assist.



Re #2: I prefer https://pola.rs over Pandas


I've heard great things about Pola.rs performance. To get there, they have a lazy evaluation so they can see more of the computation at once, allowing them to implement optimizations similar to those in a SQL engine.


In the early days, even as I appreciated what Pandas could do, I never found its API sane. Pandas has too many special cases and foot-guns. It is a notorious case of poor design.

My opinion is hardly uncommon. If you read over https://www.reddit.com/r/datascience/comments/c3lr9n/am_i_th... you will find many in agreement. Of those who "like" Pandas, it is often only a relative comparison to something worse.

The problems of the Pandas API were not intrinsic nor unavoidable. They were poor design choices probably caused by short-term thinking or a lack of experience.

Polars is a tremendous improvement.


Hey, I agree with you.

On eager vs lazy evaluation -- pytorch defaulting to eager seemed to be part of the reason it was popular. Adding optional lazy evaluation to improve performance later seems to have worked for them.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: