Hacker News new | past | comments | ask | show | jobs | submit login

> You really believe that git stores -- in full -- every version of a tracked file?

Yes, it does.

> Every commit that deletes the whitespace from an otherwise empty line in a 30KB file is another 30KB of hard drive space

Yes, it is.

"It's worth repeating that git stores every revision of an object separately in the database, addressed by the SHA checksum of its contents. There is no obvious connection between two versions of a file; that connection is made by following the commit objects and looking at what objects were contained in the relevant trees. Git might thus be expected to consume a fair amount of disk space; unlike many source code management systems, it stores whole files, rather than the differences between revisions. It is, however, quite fast, and disk space is considered to be cheap." -- https://lwn.net/Articles/131657/

One of the insights of the git design was that, nowadays, disk space is cheap. The first releases of git always stored each object separately in its own file in the object database. Git still does so nowadays, but once the number of files gets over a certain threshold, newer releases of git run an "automatic GC" which combines these "loose objects" into a "pack file"; and within that "pack file", it uses a binary diff (a xdelta) between similar objects to reduce the total size. But that's just a physical storage optimization; in the logical model, whenever you ask for an object, you always get its full contents, not a delta against some other object.




> "automatic GC" which combines these "loose objects" into a "pack file"; and within that "pack file", it uses a binary diff (a xdelta) between similar objects to reduce the total size

Isn't it the case then that git doesn't store in full every version of a tracked file?


Deduplication and compression do not imply diffs.


git does perform delta compression. From what I can gather, git's storage engine uses conventional compression (zlib), deduplication (exactly identical files need not be stored twice) and delta compression (between similar files).

The question was does git's implementation store - in full - every version of a tracked file? The answer is that it doesn't. git has a sophisticated storage engine precisely to avoid the inefficiencies of the naive approach.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: