Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just curious. In which way is data.table superior to pandas? Really interested about it! From my personal experience pandas is just sometimes a bit slow.


I'm more a dplyr man myself, but data.table is much faster than pandas, most noticeably IMO when reading large files. It's also extremely succinct if you're into that sort of thing (though I find it a bit obfuscated). pandas is a lot of things, but "fast" and "concise" are not two of them.


Got it. Regarding fast you have something like Vaex on python side (but not sure how fast it realy is). For me I had with pandas the most issues using it's multiindex.


> For me I had with pandas the most issues using it's multiindex.

Yessss. I loathe indices, and have never been in a situation where I was better off with them than without them.

> Regarding fast you have something like Vaex on python sid

I've never used Vaex, but I've used datatable (https://github.com/h2oai/datatable) and polars (https://github.com/pola-rs/polars). Polars is my favorite API, but datatable was faster at reading data (Polars was faster in execution). I'll have to give Vaex a try at some point.


Pandas is the PHP of data science. Pretty badly designed, but immensely popular because it got there first and had no real competition (in Python) for years.


I just love how much more terse and fast it is, someone else linked a benchmark below. There's definitely a learning curve though.

If you already think pandas is slow I think you'll be surprised how much more strongly you feel after using data.table!


Data.table is Faster to write and faster to perform

https://h2oai.github.io/db-benchmark/




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: