Just curious. In which way is data.table superior to pandas? Really interested a...

mrtranscendence · on Sept 14, 2021

I'm more a dplyr man myself, but data.table is much faster than pandas, most noticeably IMO when reading large files. It's also extremely succinct if you're into that sort of thing (though I find it a bit obfuscated). pandas is a lot of things, but "fast" and "concise" are not two of them.

MichaelRazum · on Sept 14, 2021

Got it. Regarding fast you have something like Vaex on python side (but not sure how fast it realy is). For me I had with pandas the most issues using it's multiindex.

mrtranscendence · on Sept 14, 2021

> For me I had with pandas the most issues using it's multiindex.

Yessss. I loathe indices, and have never been in a situation where I was better off with them than without them.

> Regarding fast you have something like Vaex on python sid

I've never used Vaex, but I've used datatable (https://github.com/h2oai/datatable) and polars (https://github.com/pola-rs/polars). Polars is my favorite API, but datatable was faster at reading data (Polars was faster in execution). I'll have to give Vaex a try at some point.

civilized · on Sept 14, 2021

Pandas is the PHP of data science. Pretty badly designed, but immensely popular because it got there first and had no real competition (in Python) for years.

bllguo · on Sept 14, 2021

I just love how much more terse and fast it is, someone else linked a benchmark below. There's definitely a learning curve though.

If you already think pandas is slow I think you'll be surprised how much more strongly you feel after using data.table!

nojito · on Sept 14, 2021

Data.table is Faster to write and faster to perform

https://h2oai.github.io/db-benchmark/