Hacker News new | past | comments | ask | show | jobs | submit | craigkerstiens's comments login

Pgquery was created by the pganalyze team for their own purposes I believe initially for features like index recommendation tooling, but immediately planned as open source. It is indeed a very high quality project with the underlying C implementation having several wrappers that exist for a number of languages[1].

[1] https://github.com/pganalyze/libpg_query/blob/15-latest/READ...


Citus works really well *if* you have your schema well defined and slightly denormalized (meaning you have the shard key materialized on every table), and you ensure you're always joining on that as part of querying. For a lot of existing applications that were not designed with this in mind if can be several months of database and application code changes to get things into shape to work with Citus.

If you're designing from scratch and make it worth with Citus then (specifically for a multi-tenant/SaaS sharded app) it can make scaling seem a bit magical.


Probably as useful is the overview of what pgdog is and the docs. From their docs[1]: "PgDog is a sharder, connection pooler and load balancer for PostgreSQL. Written in Rust, PgDog is fast, reliable and scales databases horizontally without requiring changes to application code."

[1] https://docs.pgdog.dev/


I appreciate the idea of this a lot but am a bit skeptical.

The good thing is this would at least shine a better light on all those that "claim" to be Postgres but really have little to nothing to do with it. Overwhelmingly people are supporting the wire protocol despite being a completely separate database because Postgres is already so universal and it'd be a huge investment to recreate that ecosystem of language drivers and everything else around it.

The reality is even "wire protocol" there are varying levels of support depending on what you're trying to do.

Then when you get down to functionality, it could be "we support the Postgres data types"... well except this one or that one. That's fine and good until a user is surprised 2 years into building an application.

Even the notion of we support all Postgres extensions, well all Postgres extensions don't work together some take hooks and change queries that other extensions want to modify for themselves.

Having worked with Postgres and managed Postgres for a very long time. Postgres is Postgres, there are extensions that modify Postgres, there are forked versions of Postgres, and things that are "Postgres" compatible simply aren't Postgres.


Indeed, thought this may have a lot more context with the examples and possibly be relevant.


It feels very disingenuous to say "Postgres compatible" and have this as a missing feature set. I'm sure they'd quickly argue it's wire compatibility, but even then it's a slippery slope and wire compatible is left open to however the person wants to interpret it.

There is no 'standard' or 'spec' for what makes something Postgres wire compatible.

This feels like a strong overreach on the marketing front to leverage the love people have for Postgres to help boost what they've built. That is not to say there isn't hard and quality engineering in here, but slapping Postgres compatible on it feels lazy at best.


> I'm sure they'd quickly argue it's wire compatibility, but even then it's a slippery slope and wire compatible is left open to however the person wants to interpret it.

I actually think that they'd argue they intend to close the feature gap for full Postgres semantics over time. Indeed their marketing was a bit wishful, but on Bluesky, Marc Brooker (one of the developers on the project) said they reused the parser, planner, and optimizer from Postgres: https://bsky.app/profile/marcbrooker.bsky.social/post/3lcghj...

That means they actually have a very good shot at approaching reasonably full Postgres compatibility (at a SQL semantics level, not just at the wire protocol level) over time.


I've been maintaining a starter pack for Postgres people - https://go.bsky.app/Acp7hmk


This alone wouldn't be a full replacement. We do have a full product that does that with customers seeing great performance in production. Crunchy Bridge for Analytics does similar by embedding DuckDB inside Postgres, though for users is largely an implementation detail. We support iceberg as well and have a lot more coming basically to allow for seamless analytics on Postgres building on what Postgres is good at, iceberg for storage, and duckdb for vectorized execution.

That isn't fully open source at this time but has been production grade for some time. This was one piece that makes getting to that easier for folks and felt a good standalone bit to open source and share with the broader community. We can also see where this by itself for certain use cases makes sense, as you sort of point out if you had time series partitioned data, leveraged partman for new partitions and pg_cron which this same set of people authored you could automatically archive old partitions to parquet but still have thing for analysis if needed.


Very much agreed with this general idea, and believe a lot of this was inspired by the team we hired at Crunchy Data to build it as they were socializing it for a while. Looking forward to pg_duckdb advancing in time for now it still seems pretty early and has some maturing to do. As others have said, it needs to be a bit more stable and production grade. But the opportunity is very much there.

We recently submitted our (Crunchy Bridge for Analytics-at most broad level based on same idea) benchmark for clickbench by clickhouse (https://benchmark.clickhouse.com/) which puts us at #6 overall amongst managed service providers and gives a real viable option for Postgres as an analytics database (at least per clickbench). Also of note there are a number of other Postgres variations such as ParadeDB that are definitely not 1000x slower than Clickhouse or DuckDB.


Hey Craig, for the public record- pg_duckdb was not inspired by the team at Crunchy Data. Our early mvp version, "pg_quack" was made public (apache 2.0) on February 2nd. About 2 months later, Crunchy's analytics product shipped on April 30th. If you were working on it around a similar time it was a coincidence. Let's call it great minds think alike.


Craig fan here, agree it's zeitgeist and I'm loving the PG ecosystem


I just did a project for a YC startup and we reverted to postgres from duckdb+sqlite for concerns enterprises might not see the local file combo as mature / professional.

Really excited about the idea of being able to have everything under the postgres umbrella even with sacrifices.

From the engineering side I have nothing but good things to say about duckdb.

I opened up the database to the frontend (it's an internal reporting tool not unlike grafana and I filtered queries through an allowlist) and it was pure delight to have the metrics queries right next to the graph. Very rapid iterations.


As Craig said, Crunchy has a very enterprise mature offering for analytics in Postgres and are very much leading the charge here. ParadeDB is built in a similar way, also ranking high on ClickBench, and is available in the open-source as well.

I'm hopeful the pg_duckdb project will mature enough to be a stable foundation for ParadeDB and others, but that appears to be a matter of MotherDuck and how much they're willing to push this forward.


Paradedb choose GPL. So i could see pg_duckdb accelerating past them. But then you never know each of them can change the license at any time


ParadeDB itself is AGPL, yes. Our core offering is pg_search, which offers Elasticsearch inside Postgres. What we build will be AGPL, and if pg_duckdb moves forward we will be happy to rebuild our analytics offering on top of it.


Hey Phil, the blogpost says pg_duckdb is being taken forward by duckdb labs, hydra, motherduck, neon, and microsoft azure. We're fully invested in developing pg_duckdb and I'm happy to work collaboratively- do you have something valuable to add to pg_duckdb?


There is a lot missing from it, as you know. We'd be happy to be part of the project if we get commit access/even partnership :)


Are you guys planning to opensource your work at crunchy?


I love when friends do this. It's hard to keep up with people and what they're up to. Publishing and letting people subscribe to me is a great way to share things. A few examples of some friends who are doing this:

Justin Searls (fairly known in Ruby and Rails community) mostly quit a lot of various social channels though publishes on some of them one direction. He started a podcast that wasn't meant to be guests of some specific topic, it's just him updating you on things. What he's working on, what he's learning, random stories, etc. - https://justin.searls.co/casts/

Brandur who I've worked with at a couple of places (Heroku previously, and now Crunchy Data) who writes great technical pieces that often end up here also has more of a personal newsletter. While there are technical pieces in there at times he'll also talk about personal experiences my favorite one is some of the unique experiences hiking the Pacific Trail (https://brandur.org/nanoglyphs/039-trails).


This gives me heart. I like writing about technical things, but I also like writing about personal things, concerts I went to, whatever. I'm a whole person, and I never liked the pressure (mostly from social media) to build your "brand" around one genre or style of writing. For me, my site is a personal one where I post about things I'm interested in. Ham radio, machine learning, my travels, pay phones, whatever. Maybe less useful for a reader or audience building but...I just like to write and share things.


For many it was supplanted social media. IG, TG, even TikTok (shudder) channels. It monetizes the same motivation


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: