Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do different kinds of indexes work better for columnar storage? Or is it the same principles for both?


Difference principles of indexing, as least based on my experience with ClickHouse.

* Column-based stores have really fast scans due to compression and vectorization, so you'll generally always read down the column. The way to speed it up is to have "skip indexes" that allow you to skip blocks, e.g., don't even bother to read/decompress them.

* Commonly used indexes need to be very sparse, so they fit in memory even when tables run to hundreds of billions of rows.

* Finally highly compressed columns can be used as indexes to filter data rapidly. ClickHouse calls this PREWHERE processing.

Edit: clarify skip indexes





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: