7 Databases in 7 Weeks for 2025

archiewood · 2024-12-05T18:07:52 1733422072

DuckDB is really having a moment

The ecosystem is very active, and they have recently opened up "community extensions" to bring your own functions, data types and connections. A barrier at the moment is that extensions are written in C++, though this limitation should be removed soon.

I've been building a lot on top of DuckDB, two of the projects I'm working on are linked in the article:

- Evidence (https://evidence.dev): Build data apps with SQL + Markdown

- DuckDB GSheets (https://duckdb-gsheets.com): Read/Write Google Sheets via DuckDB

thehours · 2024-12-05T20:30:51 1733430651

I'm using DuckDB for the first time for this year's Advent of Code and it's been a delightful experience so far. I was looking for something simple to set up and had more advanced functionality than what SQLite supports.

spratzt · 2024-12-05T21:10:20 1733433020

Extraordinary!

I can’t think of any of the advent of code questions this year where a database would have been of any use.

Do tell us more.

thehours · 2024-12-05T22:25:58 1733437558

It's mostly a challenge I set for myself to see how far I could get using SQL! I've done the past couple years in Python and just wanted to change it up.

So far parsing the inputs has not been nearly as much of a hassle as I thought. I really like how DuckDB supports directly selecting from files (e.g. `select * from 'file.csv'`). I thought I was going to be spending more time creating schemas, running `copy`, etc.

Overall pretty nice quality of life experience so far, even if it is only to mess around with a few puzzles.

cuanim · 2024-12-06T03:28:06 1733455686

You could try participating in Advent of SQL[0]! It's a bit rough around the edges but fun nonetheless. [0] - https://adventofsql.com/

dilawar · 2024-12-05T19:16:03 1733426163

Duckdb is really great. DX is awesome.

I cloned the CVE repository and ingested it into duckdb without worrying about table structures. Build a fastapi wrapper to query CVEs like SQL. A little bit of caching and it was pretty fast. Was done in a few hours.

Shame that their Rust crate wasn't as performant as the python module.

drewbitt · 2024-12-06T02:48:40 1733453320

I really enjoy using DuckDB for data visualization. It’s so easy to work with that it’s even come in handy during a few Twitter arguments—where I might have given up before, I was able to back myself up with data in just a few minutes. That's not even evidence.dev, just built-ins. https://duckdb.org/docs/sql/functions/char.html#barx-min-max.... We also use it at work for working with Parquet. Huge fan

hipjiveguy · 2024-12-08T21:55:12 1733694912

this sound interesting - do you have any more samples on how you win twitter arguments with this?

tmpz22 · 2024-12-05T20:29:51 1733430591

Are the testimonials on your website real or fake? Apple, Flexport, BMW, etc are using evidence.dev?

ironchef · 2024-12-05T23:43:02 1733442182

Lots of folks out there use evidence.dev. It's a simple way to get some BI up and running without needing to deal with licensing / corporate IT, etc.

Remember all it takes is 1 employee to put that claim up there (although I do like evidence.dev).

zvr · 2024-12-07T10:39:36 1733567976

To be precise, it takes 1 employee to say "used in X". It takes corporate decision to say "used by X". And it takes a written agreement to be able to use the trademarked logo of X on your page. (I know, because I have collected more than 60 such agreements to show logos on a page).

Scubabear68 · 2024-12-05T20:08:28 1733429308

I got into DuckDb thanks to HN comments like these in general, it really shines for doing "desktop" analysis that goes beyond the usual Excel/Numbers shenanigans.

tlavoie · 2024-12-07T06:33:44 1733553224

The Developer Voices podcast episode with a couple authors of a DuckDB book was quite good as well: https://www.youtube.com/watch?v=_nA3uDx1rlg

LeftHandPath · 2024-12-06T02:48:34 1733453314

Wow. Evidence is the sort of thing I've been dreaming of for a long time. Awesome work!

Jtsummers · 2024-12-05T17:27:39 1733419659

https://pragprog.com/titles/pwrdata/seven-databases-in-seven... - A book by the same name. Instead of giving you a brief blurb on each database, the authors attempt to give you more context and exercises with them. Last updated in 2018 it covers PostgreSQL, HBase, MongoDB, CouchDB, Neo4J, DynamoDB, and Redis. The first edition covered Riak instead of DynamDB.

nindalf · 2024-12-05T18:00:15 1733421615

The only one in common in both lists is Postgres. Insane longevity.

alex7o · 2024-12-05T22:47:47 1733438867

I like to joke and say that it is the Postgres Cinematic universe

cpursley · 2024-12-05T22:27:39 1733437659

Yep, obligatory Postgres Is Enough listicle:

https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

cpursley · 2024-12-05T22:27:50 1733437670

(most of the time, at least)

worksonmymach · 2024-12-05T22:57:07 1733439427

SQL Server and Oracle too but not being open source is a big minus.

dacryn · 2024-12-06T05:47:17 1733464037

I genuinely wonder why people use SQL Server, is it just because of the active directory integration? Just because microsoft says so?

I mean, its a lot more expensive than Postgres for very very little benefit. Like what exactly is the value proposition? Is it saving developer time? faster? I guess none of those

timc3 · 2024-12-06T21:08:10 1733519290

20 years ago, I was a massive Postgres fan and I still am, but I bought and used SQL server for production systems - it was better (particularly in replication), and more importantly had better support as in being on the phone with the lead dev type support.

Today I have no idea about SQL server but if the support is as good today as it was back then it can be worth it.

worksonmymach · 2024-12-06T06:22:24 1733466144

Back in the day (2000s) if you were a .NET shop it made sense as the providers for anything else were not as good. And probably you are more likely to get those skills over Postgres. And Postgres was a lot more early.

These days a I am not so sure what the case would be other than legacy or very occasionally some feature only SQL Server has. AD would be a big one.

steveBK123 · 2024-12-05T18:56:17 1733424977

Yeah and only 6 years ago, which is nothing in the span life databases historically. It's not like DBs are a new tech.

Making a decision on DB is somewhat foundational for a lot of system architecture, in that you expect to be able to use it for years. It is not like some UI framework you toss out every 18 months.

So the benefits of the new hyped thing may be outweighed by the unknowns & risk of flaming out.

This is the kind of hype cycle that gives me pause when a new hot thing like DuckDB which actually ticks a TON of boxes for me, but has attracted some of the usual suspects in my work network that I consider to be contra signals.

fuzztester · 2024-12-06T03:00:26 1733454026

iirc, there was a similarly named book, maybe by pragprog or by o'reilly, titled something like 7 languages in 7 weeks. I forget the name of the author.

Jtsummers · 2024-12-06T07:31:47 1733470307

Yes. They turned it into a series of books after that one. They are published by pragprog, which is what I linked to.

https://pragprog.com/categories/seven-in-seven/

breadwinner · 2024-12-06T03:46:52 1733456812

ClickHouse is awesome, but there's a newer OLAP database in town: Apache Pinot, and it is significantly better: https://pinot.apache.org/

Here's why it is better:

1. User-facing analytics vs. business analytics. Pinot was designed for user-facing analytics (meaning analytics result is used by end-user (for example, "what is the expected delivery time for this restaurant?"). The demands are much higher, including latency, freshness, concurrency and uptime.

2. Better architecture. To scale out ClickHouse uses sharding. Which means if you want to add a node you have to bring down the database, re-partition the database and reload the data, then bring it back up. Expect downtime of 1 or 2 days at least. Pinot on the other hand uses segments, which is smaller (but self-contained) pieces of data, and there are lots of segments on each node. When you add a node, Pinot just moves around segments, no downtime needed. Furthermore, for high availability ClickHouse uses replicas. Each shard needs 1 or 2 replicas for HA. Pinot does not have shards vs replica nodes. Instead each segment is replicated to 2 to 3 nodes. This is better for hardware utilization.

3. Pre-aggregation. OLAP cubes became popular in the 1990s. They pre-aggregate data to make queries significantly faster, but the downside is high storage cost. ClickHouse doesn't have the equivalent of OLAP cubes at all. Pinot has something better than OLAP cubes: Star trees. Like cubes, star trees pre-aggregate data along multiple dimensions, but don't need as much storage.

hodgesrm · 2024-12-06T05:24:18 1733462658

> 2. Better architecture. To scale out ClickHouse uses sharding. Which means if you want to add a node you have to bring down the database, re-partition the database and reload the data, then bring it back up.

This is not true in practice. In a properly designed implementation you can add new shards without downtime. ClickHouse compression is so efficient that most installations don't need to shard in the first place--everything fits on a single server with a few TB of storage. (We've put up to 50Tb on single servers.)

breadwinner · 2024-12-06T05:29:47 1733462987

If you want to scale out you have to add new nodes, and that means re-partitioning the same data (not new shards) across the new nodes. For example if you have 3 servers, each has 33% of the data, but if you add 1 more server, then each server has 25% of the data. If one server meets your scale requirements, then sure ClickHouse is a good choice.

hodgesrm · 2024-12-08T13:57:46 1733666266

There are multiple ways to address shard rebalancing:

1. If you are loading time series data you can often just wait for the shards to rebalance automatically as new data arrives and old data drops out.

2. If you are scaling to add new tenants, just put them on a separate shard and let them fill naturally.

3. It's not hard to move table parts, if you want to redistribute data across shards. ClickHouse comes with commands to do this and it's fairly easy to script.

4. ClickHouse can scale vertically for a long way. It works fine on VMs with 256GB of RAM and 50 TB of storage.

Just to be clear, it would be nice if open source ClickHouse could dispense with shards. In practice, though, it's not the biggest issue in most of systems I've seen (which is now in the hundreds), and there are several ways to design around it. Changing table partition keys is a harder issue to solve. Allocating adequate RAM for queries is another problem that's difficult to address.

memset · 2024-12-06T04:36:24 1733459784

I’d like to learn more! Clickhouse benchmarks don’t show Pinot favorably by comparison.

Also, do you have thoughts on Starrocks?

shadow28 · 2024-12-06T13:59:18 1733493558

> Clickhouse benchmarks don’t show Pinot favorably by comparison

Looks like they don't configure any indexes for Pinot in their benchmarks, which is one of Pinot's main selling points on the performance front - https://github.com/ClickHouse/ClickBench/issues/37.

PeterCorless · 2024-12-06T17:13:15 1733505195

Confirmed. Also, ClickBench is working from a batch-loaded data set, which is kind of antithetical to a real-world, real-time analytical database workload.

[Disclosure: I work at StarTree, and we're powered by Apache Pinot.]

We are currently considering / evaluating different methodologies to benchmark more realistic situations for real-time analytics. Potential consideration for your own benchmarking / POCs, or for a future industry benchmark spec:

1. Some sort of "freshness" (data latency) measurement: time for streaming ingestion / indexing / data ready for query. Is it consistent, or are there pauses in ingestion?

2. Some sort of "ingestion scaling" measurement: how many objects per second can you get to before you choke IO? What happens to ingested objects at different payload sizes? (This interacts with "freshness" above; i.e., you might [or might not] be able to throttle ingestion to improve freshness.)

3. Query concurrency/throughput: does your query capacity scale linearly or non-linearly? What happens at 10 QPS? 100 QPS? 1000 QPS? 10000 QPS? 100000 QPS? (Or when does it top out?)

4. Data volume: Are you querying against 1TB? 10TB? 100TB? 1 PB? (More?) This interacts with query concurrency. Because driving 100 QPS against 1 PB is a totally different case than driving 100000 QPS against 1 TB.

5. Data storage type: Are you running against local NVMe, EBS, or S3 buckets? (Maybe even HDD?) Is it all uniform storage, or is it in a tiered storage topology? If tiered, what's the % mix of the different storage types? This is just an increasing reality all vendors need to deal with. Customers want to optimize their spend per use case.

6. Query complexity: Before talking simple "latencies," you have to understand what sort of queries you're running. These aren't simple atomic row CRUD operations like a Redis or a ScyllaDB. How are you doing aggregates? Are you running queries against denormalized data in a single table, or are you doing single JOINs or multiple table complex JOINs?

7. Indexing: As pointed out by shadow28, indexes are vital for best performance. Which type of index was used? (Apache Pinot supports about a dozen different types).

And my personal favorite to throw into the mix:

8. Cluster resilience: Great. All of the above worked on a fully-provisioned stable cluster. Now knock out a node. Do it. See what happens. How long before the cluster rebalances and quiesces? What happens to your QPS and latencies during the rebalance, and then after quiescence? Measure that. Now knock out a 2nd node. Maybe a third. How many nodes can you bring down before performance goes non-linear, or the cluster is rendered as utterly unreliable.

This latter I call the "Torpedo test;" and I've been preaching about it for years[1]. How many "torpedos" can you cluster take before it sinks under the waves. It's not specific to real-time OLAP. You can use this kind of methodology to test resilience of any distributed system. And you should probably do this before you hit production.

[1] https://www.slideshare.net/slideshow/what-we-learned-about-a...

maxloh · 2024-12-06T07:54:35 1733471675

Apache Pinot is written in Java, which is a garbage-collected language and also a rare choice for mainstream databases.

Any idea if that would affect its performance?

The last time I checked there are several databases written in Go, which is also garbage-collected, but never saw one in Java except Apache Derby.

gunnarmorling · 2024-12-06T11:23:11 1733484191

Concurrent garbage collectors of recent JVM versions (ZGC, Shenandoah) can give you sub-millisecond pause times, i.e. GC pauses are not really an issue any more for typical analytics query workloads. The price to pay is reduced throughput, but a scale-out architecture like Pinot makes it easy to make up for that by adding another node, if needed.

vram22 · 2024-12-06T10:51:50 1733482310

I know of at least three databases written in Java:

HSQLDB:

https://en.m.wikipedia.org/wiki/HSQLDB

H2:

https://en.m.wikipedia.org/wiki/H2_(database)

PointBase:

https://en.m.wikipedia.org/wiki/PointBase

I have briefly used either one of the first two, I think it was HSQLDB, with Java, years ago.

PointBase was created by Bruce Scott, a co-founder of Oracle, and a few others, including an architect of Sybase. This was in the early days of Java.

All three still seem to exist, according to the links above.

throwaway2k04 · 2024-12-06T11:12:07 1733483527

Neo4J

mble_ · 2024-12-06T09:33:59 1733477639

Apache Cassandra would probably be the most notable one (outside of Kafka etc).

negus · 2024-12-06T09:50:42 1733478642

Yes. And this is the reason ScyllaDB exists

evnix · 2024-12-06T08:32:43 1733473963

Whenever I see a DB written Java, I run, very fast and very far away.

I've never had good experience with these implementations. If you say, the DB isn't performing well, the only answer you get is, well you forgot to tune these 1000 knobs and good luck if your data doesn't fit in RAM.

For some reason I feel your DB in a text file, would fare just as good as these given the same memory and resources that these Apache/java DB servers demand.

LunaSea · 2024-12-06T09:04:39 1733475879

Yes, I'm having ElasticSearch shard failures and Out Of Memory error fever every time I have to think about Java / Scala databases.

mble_ · 2024-12-06T12:16:16 1733487376

Pinot is something I haven't had any personal experience with, so that's why it wasn't on the list - same with StarRocks, or Druid.

Something for me to look into next year, clearly.

mble_ · 2024-12-05T18:26:18 1733423178

Author here.

Thanks for sharing! My choices are pretty coloured by personal experience, and I didn't want to re-tread anything from the book (Redis/Valkey, Neo4j etc) other than Postgres - mostly due to Postgres changing _a lot_ over the years.

I had considered an OSS Dynamo-like (Cassandra, ScyllaDB, kinda), or a Calvin-like (FaunaDB), but went with FoundationDB instead because to me, that was much more interesting.

After a decade of running DBaaS at massive scale, I'm also pretty biased towards easy-to-run.

leeoniya · 2024-12-05T19:49:07 1733428147

was hoping to see https://github.com/GreptimeTeam/greptimedb on the list

maybe 2026, or some bonus content for 2025 :)

https://www.greptime.com/blogs/2024-09-09-report-summary

biggestlou · 2024-12-05T22:47:31 1733438851

As a co-author of the book of the same name, I’m disappointed that you didn’t see fit to provide any kind of attribution

mble_ · 2024-12-06T09:31:42 1733477502

It was not intentional. I've corrected this oversight, and attribution is now provided - my apologies.

timbotron · 2024-12-05T20:59:01 1733432341

I'm curious why you said you don't find MongoDB interesting?

mble_ · 2024-12-05T21:27:42 1733434062

I lived through the MongoDB hype cycle.

For document databases, I'm more interested in things like PoloDB and SurrealDB.

timbotron · 2024-12-06T15:22:31 1733498551

I agree mongo is overhyped and attracts a lot of web newbies who only know javascript and don't want to think through schemas, although one interesting newer feature of mongo is time series collections -- unfortunately they are a bit buggy but they're getting better seem like a legitimate non-relational use case.

aravindputrevu · 2024-12-06T00:28:48 1733444928

Ex-Surrealer here. Thanks for listing us. Never thought someone on HN would cite us esp when it comes to MongoDB.

pradeepchhetri · 2024-12-06T03:01:06 1733454066

> If I had to only pick two databases to deal with, I’d be quite happy with just Postgres and ClickHouse - the former for OLTP, the latter for OLAP.

As the author mentioned, I completely agree with this statement. In fact, many companies like Cloudflare are built with exactly this approach and it has scaled them pretty well without the need of any third database.

> Another reason I suggest checking out ClickHouse is that it is a joy to operate - deployment, scaling, backups and so on are well documented - even down to setting the right CPU governor is covered.

Another point mentioned by author which is worth highlighting is the ease of deployment. Most distributed databases aren't so easy to run at scale, ClickHouse is much much easier and it has become even more easier with efficient storage-compute separation.

saisrirampur · 2024-12-06T03:47:49 1733456869

Sai from ClickHouse here. Have been living and breathing past year helping customers integrating Postgres and ClickHouse together. Totally agreed with this statement - there are numerous production grade workloads solving most of their data problems using these 2 purpose-built Open Source databases.

My team at ClickHouse has been working hard to make the integration even seamless. We work on PeerDB, an open source tool enabling seamless Postgres replication to ClickHouse https://github.com/PeerDB-io/peerdb/ This integration is now also natively available in the Cloud through ClickPipes. The private preview was released just last week https://clickhouse.com/cloud/clickpipes/postgres-cdc-connect...

yas_hmaheshwari · 2024-12-06T04:00:02 1733457602

Out of curiosity: Why not mysql? I am also surprised that no one has even mentioned mysql in any of the comments so far -- so looks like the verdict is very clear on that one

PS: I am also a fan of Postgres, and we are using that for our startup. But I don't know the answer if someone asks me, why not Mysql. Hence asking

pradeepchhetri · 2024-12-06T04:30:52 1733459452

To my knowledge, both Postgres and MySQL has their own strengths and weaknesses. Example: mvcc implementation, data replication, connection pooling and difficulty of upgrades were the major weaknesses of Postgres which are much improved over time. Similarly mysql query optimizer is consider lesser developed than that of Postgres's.

Overall I think Postgres adoption and integrations and thus community is much more wider than MySQL which gives it major advantage over MySQL. Also looking at the number of database-as-a-service companies of Postgres vs those of MySQL we can immediately acknowledges that Postgres is much widely adopted.

throwawaythekey · 2024-12-06T05:02:14 1733461334

A few other things I would add:

- MySQL performs a bit better when reading by primary key

- Postgres performs a bit better when doing random inserts/updates.

- MySQL you don't need to worry about vacuums

- The MySQL query optimizer is nice because you can give it hints when it misbehaves. This can be a godsend during certain production incidents.

- Last I checked MySQL still has a nicer scaling story than postgres, but I'm not sure what the latest here is.

- Connection pooling is still heavily in MySQLs favor i.e. you don't need the PG bouncer for lots for scenarios.

yas_hmaheshwari · 2024-12-06T11:21:37 1733484097

There was an article from Uber on why they shifted from Postgres to Mysql: https://www.uber.com/en-IN/blog/postgres-to-mysql-migration/

I don't know how much of that article points are still valid.

The other part in favor of mysql (in my opinion) are that there are lots of companies that use mysql in production - so the access patterns, and its quirks are very well defined Companies like Square, YouTube, Meta, Pinterest, now Uber all use mysql. From blind, Stripe was also thinking of moving all its fleet from Mongo to mysql

Perception wise, it looks like companies needing internet scale data are using mysql

codazoda · 2024-12-06T04:25:22 1733459122

I think this might come down to… Oracle.

Obviously there are alternatives like MariaDB but Postgres is a quality long standing open source solution.

rho4 · 2024-12-05T17:59:38 1733421578

The article mentions the TigerBeetle Style Guide: https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/TI...

I agree so much with the paragraphs about "Dependencies" and "Tooling".

atonse · 2024-12-05T17:23:32 1733419412

I didn't realize this [1] was a thing. I've been informally referring to our Postgres/Elixir stack as "boring, but in the best way possible, it just works with no drama whatsoever" for years.

1: https://boringtechnology.club

maximus93 · 2024-12-05T18:20:04 1733422804

DuckDB really seems to be having its moment—projects like Evidence and DuckDB GSheets are super cool examples of its potential. And yeah, Postgres’s longevity is insane, it just keeps adapting.

On the AI front, vector databases like Pinecone and pgvector are exciting, but I’d love to see something even more integrated with AI workflows. The possibilities are huge. Curious to hear what others think!

vjerancrnjak · 2024-12-05T18:49:45 1733424585

I like Clickhouse more.

Unrelated, not sure if it is just me, but ever since LLMs became popular, I've been seeing an enormous amounts of special utf8 characters no one used regularly, like this em dash you used.

How is this simple to type? If you're on a phone keyboard, you have to switch to special characters, and then have to long-hold the dash and then slide to em dash.

arglebarnacle · 2024-12-05T19:07:52 1733425672

On a full keyboard it’s not too bad—just hold alt and tap 0151 on the numpad. Honestly I wish it was harder to type for stylistic reasons—it would help cure me of my em-dash addiction

vjerancrnjak · 2024-12-05T19:15:30 1733426130

Haha, you haven't used an em dash at all. You even used -- in some of your comments.

LLMs are everywhere.

airstrike · 2024-12-06T00:40:15 1733445615

- — are built into Mac OS with Opt+- and Opt+Shift+-. I use them all the time

there are also bindings for bullets • and probably other things I'm forgetting (or that may be conflicting with other bindings I have setup)

conductr · 2024-12-05T19:17:37 1733426257

I noticed that certain browsers started auto converting a double hyphen to an emdash as I type, no LLM needed, I think that’s just a timing coincidence

cfiggers · 2024-12-05T20:12:29 1733429549

Hi, not the person you asked, but I have an answer to the question.

I have an AutoHotkey script that auto-replaces "--" with "—" (and auto-disables it in contexts where it's likely intended, like SQL editors and my terminal).

I also auto-replace "`-" with "–" so I can conveniently do number ranges with the (objectively correct) n-dash to indicate a range.

elcritch · 2024-12-05T19:27:53 1733426873

On macs you can do `alt` and `-` to get a –. Even on a phone a proper emdash can add effect over a regular dash.

dhosek · 2024-12-05T22:51:17 1733439077

Mac people call it option, not alt. alt-minus gives – and alt-shift-minus gives —. Certainly, it’s much easier than the windows enter a numeric code thing which seems insane.

elcritch · 2024-12-06T09:17:03 1733476623

I've been a Mac guy for a long time. Guess calling it alt instead of option is just a hard habit to break. ;)

alwa · 2024-12-06T00:22:43 1733444563

My iPhone autocorrects two consecutive hyphens to an em dash. I’m fairly sure it’s not something I configured manually, so I assume it is or was a default. Possibly a component of the “smart punctuation” setting for the keyboard.

tejaskumar_ · 2024-12-06T10:10:41 1733479841

> I’d love to see something even more integrated with AI workflows

Do you mean a database still? Or something like Langflow or Dify? Curious what "something even more integrated" would look like as just a DB.

aravindputrevu · 2024-12-06T00:29:55 1733444995

Oh I think there are better vector stores than pinecone. For ex Marqo or extreme case Elasticsearch

tejaskumar_ · 2024-12-06T10:09:33 1733479773

What makes them better? Have we tried Astra DB or Milvus? Curious where that stands in relation to the others.

boundlessdreamz · 2024-12-06T05:23:04 1733462584

This comment was most likely generated using AI It is reusing phrases from previous comments -

https://news.ycombinator.com/item?id=42330710 and https://news.ycombinator.com/item?id=42330639

dangoodmanUT · 2024-12-05T18:34:14 1733423654

FDB is wildly underrated. Best DB atm

merqurio · 2024-12-05T19:09:25 1733425765

FondationDB is such an incredible resource, but it’s so hard to access for the general public. It’s no wonder so few people know about it. I’ve always thought that if there were easier ways to interact with it, or some dev-friendly layers built on top (or public) it would become way more popular. it has shown how well it scales with Apple and snowflake, but is not an obvious choice for non-internet scale applications.

If you don't know about FDB; there's an amazing video about how they test it that really got me into learning more:

https://youtu.be/4fFDFbi3toc?si=9g3hZogBq3Ou25xZ

jeffinhat · 2024-12-05T22:14:45 1733436885

They are developing this simulation concept as a platform: https://antithesis.com/

mble_ · 2024-12-05T19:12:15 1733425935

Yeah, this is the bit for me. We have almost no good OSS layers for folks to "plug and play".

Its a bit of a vicious circle - because there is low exposure, no one is building those layers. Because no one is building the layers, there is no exposure.

jstimps · 2024-12-05T20:20:50 1733430050

If you're interested in a layer compatible with Elixir's Ecto, please take a look at the EctoFoundationDB adapter:

https://github.com/foundationdb-beam/ecto_foundationdb

https://hexdocs.pm/ecto_foundationdb

amazingamazing · 2024-12-05T19:33:29 1733427209

Surprised no one has built a dynamodb compatible api yet. Wouldn’t be that difficult. Dynamodb actually has similar constraints already

wmfiv · 2024-12-05T21:14:35 1733433275

Maybe there aren't too many itches left to scratch?

Between Dynamodb, Cassandra, and Scylla seems like that problem set is somewhat a solved problem? I know those products continue to move forward, but they all work really well at this point and solve the fundamental problem to a good degree.

wb14123 · 2024-12-06T04:16:07 1733458567

Ever since CockroachDB changed their license, I'm searching for alternatives. PostgreSQL is an obvious choice but is there a good HA solution? What people usually do for HA with PostgreSQL or do they just not care about it? I tested Patroni, which is the most popular one in my knowledge, but found some HA issues that makes me hesitate to use: https://www.binwang.me/2024-12-02-PostgreSQL-High-Availabili...

mble_ · 2024-12-06T09:45:19 1733478319

> What people usually do for HA with PostgreSQL or do they just not care about it?

Patroni for most cases. At Heroku we have our own control plane to manage HA and fencing which works very reliably. I also like the approach the Cloud Native PG folks have taken with implementing it in the k8s API via the instance manager[1].

Other options like Stolon or repmgr are popular too. Patroni is, despite the Jepsen testing, used well without issues in the majority of circumstances. I wouldn't over think it.

[1]: https://cloudnative-pg.io/documentation/1.24/instance_manage...

pjmlp · 2024-12-05T17:45:06 1733420706

For me the pinnacle of boring RDMS are Oracle and SQL Server.

motorest · 2024-12-05T17:52:15 1733421135

Their hefty pricetag bumps them solidly into exciting territory.

pjmlp · 2024-12-05T18:34:38 1733423678

Definitely exciting Fortune 500 projects.

fipar · 2024-12-05T18:03:12 1733421792

Don't forget DB/2!

And, from the same vendor, IBM still ships IMS for those who find Codd's ideas too exciting.

neverartful · 2024-12-05T21:03:50 1733432630

IBM also still sells Informix.

hobs · 2024-12-05T17:51:33 1733421093

There's definitely been fixes in the last half decade to make SQL Server a little more boring, but there's plenty of defaults that were simple foot guns.

Full recovery mode defaults combined with log growth settings, query parallelism costs, maintenance and backup solutions being offhanded to Ola, etc - many of those things are fixed but things like parameter sniffing are still absolutely causing pages going off for someone today.

sdesol · 2024-12-05T22:03:19 1733436199

For those not familiar with DuckDB, it's an amazing database, but it is not a replacement for SQLite, if you are looking for a lightweight server side DB. I'm in love with the DuckDB client and use it to query SQLite databases, but due the fact that it only supports one concurrent write connection, it is not suitable as a server side DB.

petesergeant · 2024-12-05T17:22:58 1733419378

Original 7 were: Redis, Neo4J, CouchDB, MongoDB, HBase, Postgres, and DynamoDB

turnsout · 2024-12-05T18:30:34 1733423434

Another reason to choose Postgres… if you fell asleep and woke up 20 years later like Rip Van Winkle, Postgres would still probably be a 100% valid choice.

steveBK123 · 2024-12-05T19:17:30 1733426250

Precisely - DBs are the LAST thing I want to have to deal with swapping every 3 years.

ako · 2024-12-05T17:54:47 1733421287

In this age of AI, I’m missing a database with a focus on AI, e.g, something like a vector database, or maybe something even better…

mble_ · 2024-12-05T18:22:46 1733422966

Author here. I deliberately didn't include one because I'm waiting for the dust to settle a bit.

There is a lot of activity in the space, from things like TurboPuffer (proprietary), Postgres extensions like VectorChord and pgvector, to established players like Elastic getting in on the action. Not to mention things like Pinecone.

ako · 2024-12-05T18:53:23 1733424803

Fair point, but would be interesting to see what characteristics make for a best in class ai database, rag, vectors? Or do you foresee ai to replace databases? “ChatGPT, can you remember this data for me?”

mble_ · 2024-12-05T19:11:02 1733425862

Definitely doesn't replace databases, at least LLMs as they currently are. We're going to be stuck with relational algebra for a long time to come, but the interfaces to interact with store might change over time from SQL to more natural language.

> what characteristics make for a best in class ai database

As I said before, I think the space is moving too fast on what is "best in class" - see all the vector indexing work that has happened in the last ~6 months or so. Big, centralised vector stores for some applications will have very different profiles to vector stores on the edge/on device, for example.

As mentioned, I'm a big fan of boring technology, so I'm going to happily wait we have a boring "winner".

SahAssar · 2024-12-05T18:08:46 1733422126

pgvector? postgresml?

nindalf · 2024-12-05T18:01:22 1733421682

Pgvecto.rs?

joeevans1000 · 2024-12-06T07:47:35 1733471255

No bitemporal db (i.e. xtdb)?

mble_ · 2024-12-06T09:46:33 1733478393

I love Datalog, but its such a niche technology. If I had included it, I would have probably swapped out TigerBeetle for it.

refset · 2024-12-06T13:25:44 1733491544

The upcoming XTDB v2 is a SQL-first engine. We also built an experimental Clojure/Datalog-like 'XTQL' language to go along with it, to provide some continuity for v1 users, but the primary API is now SQL over the Postgres wire protocol, where we implemented a variation on SQL:2011 - see https://docs.xtdb.com/quickstart/sql-overview.html

mble_ · 2024-12-06T15:43:59 1733499839

Oh, very cool. I'll have to add this to my list to check out next year.

deadbabe · 2024-12-05T17:36:34 1733420194

On the first day of Christmas my true love sent to me, a brand new Postgres query

znpy · 2024-12-05T19:14:39 1733426079

> CockroachDB

didn't it like go closes-source a while ago?

mble_ · 2024-12-05T19:38:53 1733427533

Still very much "open-source": https://github.com/cockroachdb/cockroach

But relicensed to the "CockroachDB Software License" as a form of BSL to prevent reselling.

maxmcd · 2024-12-05T20:43:20 1733431400

I think "source available" at this point: https://news.itsfoss.com/cockcroachdb-no-open-source/

kussenverboten · 2024-12-05T23:18:31 1733440711

Shouldn't Qdrant be on that list in 2025?

hobofan · 2024-12-06T05:26:43 1733462803

See https://news.ycombinator.com/item?id=42331258

I feel like the article is written mainly for Database power-users (= people that look beyond the horizon of the default DB choice of Postgres/MySQL), as well as for people getting onto database engineering.

I would say for both of those audiences, Qdrant (or any of the dedicated vector databases) isn't too interesting. For the most part they are very normal (and usually less-featured) databases with a heavy focus on a special kind of index, which most DBs on the list can also handle either natively or via extensions.

AzzieElbab · 2024-12-05T23:09:36 1733440176

no one uses or writes new graph databases. sad

pphysch · 2024-12-06T07:13:48 1733469228

Pure graph DBs, like Neo4j, are toys. Just use a proper relational database or kv store for your vertex/edge data, and then load it into memory in your desired graph format and do your graph analysis on it in your favorite PL.

amazingamazing · 2024-12-05T18:22:27 1733422947

Why duck db vs chdb?

mmsimanga · 2024-12-05T19:13:46 1733426026

I work in a big none tech corp and I cannot install WSL so I can try out Clickhouse.

mble_ · 2024-12-05T18:30:56 1733423456

I didn't want ClickHouse to take all the glory. /s

The actual reason is that DuckDB's API and integration into other places (e.g. Evidence) and its use of extensions (like the aforementioned gsheets one) gives it priority for me.

Additionally, its being used in a bunch more places like pg_duckdb that make it more "worth it".

littlekey · 2024-12-05T20:50:12 1733431812

I'm just gonna say it: unless I had a specific reason to use it, I would cross CockroachDB off my list purely based on the name. I don't want to be thinking of cockroaches every time I use my database. Names do have meaning, and I have to wonder why they went with that one.

greghn · 2024-12-05T20:58:32 1733432312

> Names do have meaning, and I have to wonder why they went with that one.

"CockroachDB gets its name from cockroaches, as they are known for being disaster-resistant.[5]"

https://en.wikipedia.org/wiki/CockroachDB

dankwizard · 2024-12-05T23:03:06 1733439786

I go further than judging a book by its cover, I judge it by its name!