> Ledgers are quite easy to build. Git is a ledger. It doesn't use a block chain but it does link commits via their hashes; which makes it an audit proof ledger. Unless you can break the hashes, you can't modify its history.
Easy as an arm chair theory, impossible in practice without a complete rewrite of git from scratch. You are also disregarding the immutability of the data and the append-only property of stored datasets in ledger DB's: the entire temporal history of changes to a document (or a table) must be available in a ledger database. Changes are linked by virtue a Merkle tree, and the content of each node (i.e. revision) can be cryptographically proven to be untampered with. QLDB documentation provides a visualisation of the cryptographic verification process and of a sequence of steps: https://docs.aws.amazon.com/qldb/latest/developerguide/verif...
Git is absolutely the wrong tool for the job as hashes in git are used as identifiers to identify a specific change, but not to cyrptographically checksum the document/the change to the data. Git commits can also merged, rewritten, squashed or deleted – something is absolutely unacceptable in the compliance space.
So no, Git it is not easy to build a ledger database out of git if there are strict compliance requirements as I can pull out a revision M out of N revisions of my documents stored in a ledger database, run the revision N through the proof interface and use it in a court hearing as evidence or turn it over to external auditors. And no external auditing body will ever certify a git based solution.
> Git is absolutely the wrong tool for the job as hashes in git are used as identifiers to identify a specific change, but not to cyrptographically checksum the document/the change to the data. Git commits can also merged, rewritten, squashed or deleted – something is absolutely unacceptable in the compliance space.
Read up on git internals, it's a very elegant and simple design and you seem to assume a few things about it that are flat out wrong. A git repository is in fact a Merkle tree. Every commit is chained to the previous commit via a hash that includes the commit content and the hash of the previous commit. So, git history is immutable. You can't modify commits without breaking the chain of hashes. It's by definition append only.
You can of course modify your local copy of a git repository and rewrite history (i.e. change all the hashes). But that makes it incompatible with upstream history. People tend to use this to clean up their local change history before they send it upstream. With the exception of force push (i.e. just overwrite the remote repository), there is no way to merge a rewritten history. In order to merge changes, there MUST be a shared commit with the same history. It's not optional.
> Read up on git internals, it's a very elegant and simple design and you seem to assume a few things about it that are flat out wrong.
I am well acquainted with git internals and object types, which include blobs, trees, commits, and tags. But I do acknowledge that I was wrong about git not hashing the object's contents – it does.
> A git repository is in fact a Merkle tree. Every commit is chained to the previous commit via a hash that includes the commit content and the hash of the previous commit.
Commits in git are grouped into commit trees that form a single atomic commit, and git allow manipulations of the commit trees, which is fundamentally unacceptable for an immutable, verifiable ledger. That is why I am asserting that git is not an acceptable technology for a ledger.
> You can of course modify your local copy of a git repository and rewrite history (i.e. change all the hashes). But that makes it incompatible with upstream history.
I have personally squashed commits in my local project copy and force pushed such a squashed commit into a upstream repository thereby completely decimating the commit history in the upstream repository. Previous commit history was completely gone every single time. It is an occasionally useful feature (i.e. fixing mistakes of young developers), but the commit history rewrite is absolutely unacceptable in the compliance world. Just the fact that there is direct, unabridged write access to the history or to the storage where the history is recorded makes such a solution untenable as a ledger database.
I do concede that there are some conceptual similarities between git and ledger databases at the architectural level; I do, however, vociferously dispute the suitability of git as a foundation for building a ledger database without a substantial rewrite of the git foundation.
You were arguing it's not a proper merkle tree and it is. You are not a little bit wrong there but flat out barking up the wrong tree. It's a merkle tree with full immutability and append only behavior. This is not an accidental/conceptual similarity. It's the single most important design decision that Git is based on.
Force push only allows you to wipe out repositories that you control. The whole point of git is that you do not need write access at all to share commits. That's why it's called a pull request and not a push demand. You invite others to look at and maybe merge your changes. There is no way to force the other side to do that and force push rewriting their commit history is not going to help your case for obvious reasons.
I am not arguing the case of git having or not having Merkle trees (git has a «weak» implementation of the Merkle trees, for in a ledger database every block's cryptographic signature derives from the contents of the preceding block – a property that git does not possess as it does not support the concept of blocks and operates of commit trees instead that are allowed to be moved around). I am arguing two points, specifically:
Point no. 1:
> Ledgers are quite easy to build.
Cryptographically provable, tamper-proof ledgers that will sustain a 3rd party compliance certification are not easy to build, they are expensive to build, test and certify (very expensive!), and are very expensive to operate, especially in large scale environments.
Point no. 2:
> Git is a ledger.
git is not a proper ledger due to the lack of two fundamental properties of a ledger, as originally stated:
1. an immutable, tamper proof, append-only transaction log;
git does not satisfy this requirement by virtue of allowing one to tamper with the commit history. Write access to commits is the major offender here as it is there, and it opens the door for the commit history abuse. An append-only transaction log also offers a linear, entirely immutable, history (a very important quality from the compliance perspective!), whereas git (by virtue of allowing branches) allows for a non-linear, mutable, temporal history that breaks the linear progression of revisions of valuable documents/assets being stored in it.
2. cryptographically verifiable datasets;
git offers a limited facility into the verification due to allowing one to modify the commit trees, and, by extension, to rewrite the commit history. A git commit history can't be trusted to from the compliance perspective.
What I am sensing is a misconstruction of the purpose of git. git is designed to track changes but not to provide the evidence of changes having not been tampered with. Both purposes have their own places and their own use cases, and the git design is sound and solid, yet it serves an entirely different purpose altogether, and – no – it can't be repurposed as a ledger. That is the case I am arguing.
Git provides all of those things, as long as everyone agrees on what the current HEAD commit id (hash) is. You could publish that hash in a printed newspaper or with a government regulator. After publishing it is impossible to rewrite commit history without evidence of tampering.
Rewriting history and amending commits forces the recalculation of all commit hashes that follow it, and you end up with a completely different final hash.
AWS QLDB does the same thing with Amazon holding the final hash.
Publishing a commit ID in a printed newspaper does not prevent the data loss nor the tampering with the data.
Destructive operations in a ledger database are simply not available at the API/interface contract level.
A delete operation in a ledger database is non-destructive and does not delete the data but rather creates a new «deleted» revision in the history table. The deleted revision is always available, and it can be «undeleted» as there is nothing special about the deleted revision: it is merely another document revision with the N+1 revision number. The document can be deleted (revision N+1), undeleted (revision N+2), updated multiple times (revision N + M1, N + M2 …), deleted again (revision N + M1 + M2 + P), yet the entire history of all past and current revision(s) is always available, and it can be retrieved/inspected; even after the table is dropped, it can still be undropped without incurring a data loss – all of that is the append-only ledger guarantee.
The only destructive operation that is practically possible is tearing down the entire ledger at the infrastructure level, which is irreversible and will result in a full data loss. However it can be mitigated (to a very large extent) through segregating and locking down access to the infrastructure.
git does not provide the immutability guarantee as it allows destructive operations, therefore it does not qualify for a ledger proper. A commit history can be rewritten or deleted resulting in a data loss. Published commit ID's are of no use if the data has been tampered with later and valuable revisions have been deleted in perpetuity.
> every block's cryptographic signature derives from the contents of the preceding block
Eh no. Every git commits hash derives from the previous commit hash. That's a bona-fide cryptographic hash. And you can verify it. And git in fact does that when you merge changes. By recalculating the hashes. And it rejects your changes if things don't line up.
> git is not a proper ledger due to .... an immutable, tamper proof, append-only transaction log;
Also no. It's called a commit log which is exactly this. It's tamper proof because you can verify the chain of hashes (and git does this). It's append only because of those hashes. If you break the chain, git will reject your merge attempts. Any repository fork you don't own, is read-only and won't be able to accept your force pushes with modified hashes because you won't be able to push anything at all. You can only request others to pull; which triggers the before mentioned checks. So you have no way of tricking others into pulling your broken chain of commits. You can only do that to your own repositories. And you have to opt in to it via force push. This is not a weakness but a simple feature. Your data, your choice.
> ... cryptographically verifiable datasets
Also no. Git uses sha1 hashes, which are cryptographic hashes that you can verify. Better hash algorithms are available and sha1 is not super safe at this point. But good luck creating commits that look legit and preserve the sha1 hash. As mentioned before, they include the hash of the previous commit.
Your central, and only, argument is that you can modify your own copies of a repository via force push or poking around on the filesystem. Or whatever. Yes you can; just like with your local copy of whatever blockchain! But don't do that. Because you won't be able to get others to accept your changes/transactions because they'll trivially detect that you tampered with the merkle tree and reject your changes. That's why it's called force push, you opt into this. And others will opt out of it.
You can do the same things with ledger databases, it's just no so convenient, as just typing --force
If you have permission to replace database with with your new one (or spin up new instance of DB, adn redirect app to use that, depending on how you are set up), it's just not as convenient (by design).
I used centera WORM disks at my previous jobs, for added guarantees. (not by choice, stay away of you have a choice)
And once the GDPR came out, and we had to remove some content, that's what we did, made a copy, with bad data removed, nuked centera clean, (created new partition, and disabled old one) and restored now clean data on. As far as our applications were concerned nothing happened.
You are not incorrect. Ledger databases, as a technology, do not eliminate a possibility of spinning up a shadow ledger and rebuilding the change history in the new ledger that is purged of inconvenient records.
Managed ledger database services in the cloud (e.g. QLDB in AWS) offer a partial solution to the problem such as platform generated timestamps in the delete/change history table (hard to forge), delete operations not actually deleting the data but moving it into a separate history table, the separation of access to the ledger (app vs account administrator), disallowing direct access to the storage, setting up anomaly detection alarms in the monitoring layer etc etc.
But none of that precludes the possibility of a ledger being nuked or rebuilt (and later nuked).
> Easy as an arm chair theory, impossible in practice without a complete rewrite of git from scratch.
> Git is absolutely the wrong tool for the job
Our company uses ledger.cli on top of git as a bookkeeping ledger [https://www.ledger-cli.org]. Works brilliantly, so much easier to use the text wrangling tools we know and love instead of some brain dead GUI of an accounting program.
Easy as an arm chair theory, impossible in practice without a complete rewrite of git from scratch. You are also disregarding the immutability of the data and the append-only property of stored datasets in ledger DB's: the entire temporal history of changes to a document (or a table) must be available in a ledger database. Changes are linked by virtue a Merkle tree, and the content of each node (i.e. revision) can be cryptographically proven to be untampered with. QLDB documentation provides a visualisation of the cryptographic verification process and of a sequence of steps: https://docs.aws.amazon.com/qldb/latest/developerguide/verif...
Git is absolutely the wrong tool for the job as hashes in git are used as identifiers to identify a specific change, but not to cyrptographically checksum the document/the change to the data. Git commits can also merged, rewritten, squashed or deleted – something is absolutely unacceptable in the compliance space.
So no, Git it is not easy to build a ledger database out of git if there are strict compliance requirements as I can pull out a revision M out of N revisions of my documents stored in a ledger database, run the revision N through the proof interface and use it in a court hearing as evidence or turn it over to external auditors. And no external auditing body will ever certify a git based solution.