Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I agree a data diff would be challenging, especially at scale, but one tool I think is lacking is schema diffs. For sure one can see the sequence of migrations that were applied, but if all you have is a series of sql files that add/remove/update column definitions, by the end of one or more diffs, you may not actually know or remember what's IN the table you're trying to understand. And if you don't have prod access to show create table (or equivalent), you're left with tracing the diff operations and reconstructing the table schema yourself. Have you seen a tool that can do that?


I'm the author of a schema management tool, Skeema [1], designed to solve this problem for MySQL and MariaDB.

There are a number of other existing tools that can compare/diff schemas on 2 live databases, but Skeema is designed to also actually manage your database structure through a declarative repo of CREATE statements. It works at any scale (natively supports sharding and external OSC tools) and is trusted by several large users, including GitHub [2] and Twilio SendGrid [3].

[1] https://www.skeema.io

[2] https://fosdem.org/2020/schedule/event/mysql_github_schema/

[3] https://sendgrid.com/blog/schema-management-with-skeema/


jOOQ is an SQL library, which in addition to an internal SQL DSL for Java also includes other goodies like an SQL parser and since very recently (still under active development) also a schema diff tool (also available as CLI): https://www.jooq.org/diff/.

One thing which sets jOOQ apart from most other tools out there is the fact that it supports many different SQL dialects. Thus the schema diff tool can for instance also parse DDL in one dialect and render the diff in another SQL dialect. For certain applications this could be of interest.

Disclaimer: I am an active committer on the jOOQ project.


I run migrations locally and on dev on sanitised snapshots of live data, and have easy access to those, so I just use the db to view the schema if required. Regular snapshots of the data are useful too.

If migrations are kept small there isn't usually much confusion over what changed (see migration), or what exists (see db).


For sure you'd have to add rolling hashes, but that's probably more natural in a tree based data store.

I've added these to https://sirix.io, which fastens diffing considerably, especially with deep trees. Otherwise indexing changes could be done :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: