Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There’s nothing for Python except Pandas. I came from a FP and static typing background before I moved into ML/quant finance and initially found Pandas incredibly difficult to reason about. The tool is designed for scientists who know nothing about how a library should be designed or how a program should be structured. There’s a lot of dynamic stuff in Pandas that while making things easier for scientists make things a lot more difficult for CS people. Same thing with numpy and scipy, and other data science libraries.

My absolute favorite DataFrame library is saddle (for Scala), which I helped write at my old quant job. Very FP oriented and an absolute pleasure to use. Though maybe it’s no surprise that I like something I worked on.

An incomplete list of things that I dislike about Pandas are:

Too many parameters and knobs for each function

Inconsistenty between inplace and copying operations

Unintuitive function names compared to FP

Too much magic in how things work

Functions and parameters accept a wide range of types in order to make things “just work.”

Lots of non-orthogonal convenience functions that do mostly the same thing

I’m not familiar with how “normal” Python is written, but I suspect a lot of the problems come from the abuse of dynamic typing. Dynamic typing allows you to just add more and more levels of crap without actually changing your data/type model. I think there’s a lot of value in “correct” APIs, vs convenient ones.

That being said, Pandas is extremely powerful, and usually very succinct. Maybe not as nice as kdb+/q (nothing really compares for time-series data), but still pretty good.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: