I'd also recommend Druid, MemSQL, SnappyData, MapD and other column-oriented databases. Any of them can partition on a time column with full SQL and extremely fast aggregations and high compression that come from columnar storage.
Hi, you've posted this notion that partitioning a column store by time would yield the same result as TimescaleDB a few times, so thought we'd jump in and clear things up. We fully agree that column stores have their place, particularly if you have a massive number of metrics, and all you care are roll-ups on single column axes. There are some major differences between TimescaleDB and column stores. Namely, TimescaleDB supports a lot of features that column stores in general do not.
- Secondary Indexes.
- Transactional semantics.
- Can operate on data sets greater than available memory (doesn't have to be all in-memory unlike memSQL and some others). Time-series data is voluminous.
- A whole bunch of specialized time-based optimizations that optimize query plans when working with time-based indexes and data.
- Constraints - Including foreign keys.
- Triggers
- Joins with relational data.
- Full SQL - allowing you to use complex queries and window functions
- Compatible with data tools that use SQL - which gets you gets you the richest ecosystem of tools in the data world
- The full gamut of Postgres datatypes including JSON/B and GIS location data
- 20+ years of reliability, tested backups, live streaming replication, etc.
- Geospatial support through best-in-class PostGIS
Point of clarification - MemSQL does not need to be all in-memory, there is also a columnstore that is on-disk and only leverages memory for indexes / column segment information.
It depends on the queries but columnstores would yield a faster result. We're not new to this and have used ClickHouse, MemSQL, SQL Server, and Druid extensively.
Columnstores just store data by column, they do not have any inherent limitations because of it. They all support SQL and compatible tools (although Druid is experimental SQL using apache calcite). They all store columnstore tables on disk (memsql uses rowstores in memory, sql server can optionally run columnstores in-memory using its hekaton engine, and they all use in-memory buffers for rapid ingest). They can all do geospatial queries, support JSON columns and some can handle nested/repeated structures. Indexes are available but unnecessary when you can prune partitions based on what's contained in each segment, especially when using a primary sort key (like a timestamp column in your case). SnappyData has a unique statistical engine to tradeoff query precision for much faster results (like HLL+ algorithms applied to the entire dataset). MemSQL will do OLTP access with full transactions across both rowstore and columnstore data.
Congrats on the VC funding, I'm always happy to see new projects and building on Postgres does give you a solid base with triggers and foreign keys (which come with their own scaling issues), and extending time-based functions will be useful -- however my issue is the marketing spin where you claim to be better than everything else. Columnstores are very fast, efficient, performant, and time as a dimension is not a new challenge. That's before considering the bigquery/snowflake superscale options or specialized databases like kdb+ which have served the financial industry for decades.
Approaching the field with a single-node automatic partitioning extension (as of today) for a rowstore RDMS and saying you're better than the rest on features that they already have just strikes me as insincere. It would be better to recognize the competition and focus on what you're good at instead.
I couldn't agree more with manigandham. Column-store data warehouses have nearly all the features that cevian mentioned, and a column-store with a time column as the partition key will run analytical queries much faster than a row-store, even a row-store like TimescaleDB that's specialized for time-series data.
“my issue is the marketing spin where you claim to be better than everything else”
I’m sorry that was your impression and it’s certainly not our intent to mislead, although I’m not really sure why/where you think we claimed this. Indeed, the quoted article even says that "Timescale is not trying to take on Kx Systems directly in this core market.", and that such organizations have different needs for different use cases.
Technology choices are all about trade-offs, and databases are no different.
I think what he might be saying is that you clearly are trying to be a direct competitor those you say you are not. Claiming to not compete against the prop TSDB offerings just so you can stack the comparison deck in your favor by then comparing yourself to the less than acceptable FOSS offerings is a little disingenuous.
This would definitely clear things up in my mind.
Why would I use Timescale over KDB or IQ or Vertica? Is it just a price thing, you are mostly cheaper (both licensing and finding talent)? If cost was a minor issue, why chose Timescale? What advantage does it have over those other TSDBs? That bullet list that has been repeated a couple times seem to not really be unique to Timescale when compared to the other big columnar databases.
Maybe it is you have a good story on how you can do scalar operations better than the others? You you have a particular workload mix you are trying to target?