More

cfors · 2026-01-23T16:38:36 1769186316

The underlying C library interacts directly with the postgres query parser (therefore, Postgres source). So unless you rewrite postgres in Rust, you wouldn't be able to do that.

vineyardmike · 2026-01-23T18:33:13 1769193193

Well then why didn’t they just get the LLM to rewrite all of Postgres too /s

I agree that LLMs will make clients/interfaces in every language combination much more common, but I wonder the impact it’ll have on these big software projects if more people stop learning C.

cfors · 2026-01-15T13:50:47 1768485047

https://grugbrain.dev/

grug very elated find big brain developer Bob Nystrom redeem the big brain tribe and write excellent book on recursive descent: Crafting Interpreters

book available online free, but grug highly recommend all interested grugs purchase book on general principle, provide much big brain advice and grug love book very much except visitor pattern (trap!)

Grug says bad.

In all seriousness, the rough argument is that it's a "big brain" way of thinking. It sounds great on paper, but is often times not the easiest machinery to have to manage when there are simpler options (e.g. just add a method).

not-a-juggler · 2026-01-15T14:40:25 1768488025

https://news.ycombinator.com/item?id=44304648

Grug doesn't elaborate much, but here's the author's take in slightly more detail.

cfors · 2026-01-05T16:13:46 1767629626

https://duckdb.org/docs/stable/core_extensions/vss

It's not bad if you need something quick. I haven't had a large need of ANN in duckdb since it's doing more analytical/exploratory needs, but it's definitely there if you need it.

cfors · 2025-11-11T20:25:15 1762892715

Just curious what the state of the art around filtered vector search results is? I took a quick look at the SPFresh paper and didn't see it specifically address filtering.

cfors · 2025-09-12T14:27:53 1757687273

In any API service, it's better to handle via dependency injection IMO.

Instantiate all of your metadata once, and then send that logger down, so that anybody who uses that logger is guaranteed to have the right metadata... the time to add logging is not when you are debugging.

cfors · 2025-08-08T15:21:06 1754666466

I don't disagree that rock solid is a good choice, but there is a ton of innovation necessary for data stores.

Especially in the context of embedding search, which this article is also trying to do. We need database that can efficiently store/query high-dimensional embeddings, and handle the nuance of real-world applications as well such as filtered-ANN. There is a ton of innovation in this space and it's crucial to powering the next generation architectures of just about every company out there. At this point, data-stores are becoming a bottleneck for serving embedding search and I cannot understate that advancements in this are extremely important for enabling these solutions. This is why there is an explosion of vector-databases right now.

This article is a great example of where the actual data-providers are not providing the solutions companies need right now, and there is so much room for improvement in this space.

whakim · 2025-08-09T06:10:25 1754719825

I do not think data stores are a bottleneck for serving embedding search. I think the raft of new-fangled vector db services (or pgvector or whatever) can be a bottleneck because they are mostly optimized around the long tail of pretty small data. Real internet-scale search systems like ES or Vespa won’t struggle with serving embedding search assuming you have the necessary scale and time/money to invest in them.

cfors · 2025-08-10T13:19:16 1754831956

Sure they can handle the basic case of ANN. But ANN still doesn’t have good stories for lots of real-world problems.

* filterable ANN, decomposes into prefiltering or postfiltering.

* dynamic updates and versioning is still very difficult

* slow building of graph indexes

* adding other signals into the search, such as query time boosting for recent docs.

I don’t disagree these systems can work but innovation is still necessary. We are not in a “data stores are solved” world.

whakim · 2025-08-11T06:18:16 1754893096

* Filterable ANN certainly decomposes into pre- and post-filtering, and there is definitely a lot of interesting innovation occurring around filterable ANN. But large-scale search systems currently do a pretty good job with pre-filtering, falling back to brute force search in the case of restrictive filters.

* You'd have to be a bit more exact re: dynamic updates/versioning for me to understand the challenges you're facing.

* Building graph indices can be slow, but in my experience (billions of embeddings) it is possible to build HNSW indices in tens of minutes.

* How is this any different to combining traditional keyword search with, say, recency boosting?

cfors · 2025-08-11T12:24:26 1754915066

Might be missing my argument here - I stated that there are workable solutions to this like you have pointed out.

But ANN search is still a sledgehammer and building out hybrid solutions that help bridge the gap between this and traditional data stores still have room for innovation.

whakim · 2025-08-12T01:10:57 1754961057

Fair enough - agreed there's lots of interesting innovations here - but my point is that semantic search and its associated issues don't really differ that much from other types of search problems at scale, and I therefore don't think that the current crop of vector database products add a lot of value from a technical perspective (perhaps they do from an ease-of-use perspective; or they work great at small scale, etc. etc.)

mdaniel · 2025-08-09T16:58:41 1754758721

> Real internet-scale search systems like ES

Oh, then you must have the secret sauce that allows scaling ES vector search beyond 10,000 results without requiring infinite RAM. I know their forums would welcome it, because that question comes up a lot

Or I guess that's why you included the qualifier about money to invest

whakim · 2025-08-09T23:05:49 1754780749

Would you mind putting aside the snark? I have a couple questions. How large is the corpus? I am also curious about the use-case for top-k ANN, k > 10000?

farsa · 2025-08-10T00:45:56 1754786756

Not the person you have asked but at work (we are a CRM platform) we allow our clients to arbitrarily query their userbase to find matching users for marketing campaigns (email, sms, whatsapp). These campaigns can some times target a few hundred thousand people. We are on a really ancient version of ES, but it sucks at this job in terms of throughput. Some experimenting with bigquery indicates it is so much better at mass exporting.

whakim · 2025-08-10T01:23:43 1754789023

Fair; my question was mostly in the context of ANN, since that was the discussion point - I have to assume ES (as a search engine) would not necessarily be the right tool for data warehousing types of workloads.

cfors · 2025-06-10T10:08:03 1749550083

The README is 100% just autogenerated by Claude. It looks like every README generated from these tools.

Can’t speak for the code since I haven’t peaked into it

owebmaster · 2025-06-10T14:41:44 1749566504

The README is a great overview of the MCP features. This is terribly derisive.

jamesblonde · 2025-06-10T10:44:01 1749552241

I thought it was a bit verbose - lots of repetition. I didn't realize how extensive claude/cursor generated READMEs were.

cfors · on Oct 21, 2024

While not strictly for RDBMS, I think this book is pretty close!

https://www.databass.dev/

cfors · on July 25, 2024

Just wanted to say thank you for this article - I've read and shared this a few times over the years!

cfors · on July 4, 2024

When Breath Becomes Air by Paul Kalanthi.

A fascinating memoir by a philosopher turned brain surgeon, facing a terminal cancer diagnosis. A person who spent their entire life pondering the morality of life being faced with their own ultimatum.

I reread it once a year, at minimum. A deeply moving book.

etrautmann · on July 4, 2024

I was a labmate of Paul when I was starting my PhD. He was an incredible human, and such a fantastic writer.

TheAlchemist · on July 4, 2024

It's a fantastic memoir indeed, very moving.

Love this quote from the book: "You can't ever reach perfection, but you can believe in an asymptote toward which you are ceaselessly striving".

zanmat0 · on July 4, 2024

Isn't that just a more verbose way of saying "You can reach for perfection."?

lamp_book · on July 4, 2024

Everything good’s already been said. All that’s left is just a wordier retelling.

Horffupolde · on July 4, 2024

Like Dr. Wilson from Dr. House.