Hacker News new | past | comments | ask | show | jobs | submit login
Version control of really huge files (alumnit.ca)
35 points by soundsop on Oct 4, 2009 | hide | past | favorite | 8 comments



How about just storing the file in Venti? A Git API could probably be written that uses it underneath: http://en.wikipedia.org/wiki/Venti



Maybe I'm missing something but is this really a common use case? Wanting to store 100mb DB dumps in version control? I'm not sure VCS is the best for pure backup needs.


Well, the non-common use cases are mostly the ones that are harder to solve.

I don't have to do much with databases, but a lot with multimedia and large files for which you want some version control are rather common there. And it's more tricky because it's not text-based but binary and because you usually don't want every change to be a new version but only save very specific versions and otherwise changes should just replace the last version. Actually I haven't found yet a way to do that with mercurial which I use otherwise. So for now I just mix: Really large files are kept outside hg and saved and versioned by hand which means I have to distribute them additional to hg export when copying the project around. And for smaller files I just create new versions on each change even though I don't like it and would mostly prefer to just replace old versions. Unfortunately that is not the problem solved in this article.

What I would need would be a version control system that allows me to tag files so that those files are just replacing previous revisions on a commit (maybe still noting all the times they got updated) unless I specifically tag them as "new version".


I would really like this type of feature from a git or hg. One of my concerns about the state of version control is the sub par handling of binary files. Git works well for the intended use case of the strictly text projects such as the linux kernel etc. but not as good for complex multimedia applications with changes to binary files being common. I guess I could do a repack often but it would be nice for a version control system to handle these files better. I know it's a pipe dream right now but one can hope.

Edit: I guess I should say I was referring to open source DVCS and not the commercial stuff like perforce.


Well, it's not common, but I have this problem. The particular sort of data I process involves doing processing across gigabytes of data daily. I want to be able to store somewhat reasonable sample sets as fixtures for the purposes of using in my unit tests.

We're currently using perforce company wide, but I use git and git p4. I keep trying to convince people that perforce has a shitty workflow though, but the large file support is why it's still gotten a pass.


I'm not sure I understood his article, but I think he wanted the ability to monitor or back out of changes, not just backup. I don't really see why he wants to do it the way he was writing about though. Isn't that capability an inherent part of database management systems?


anyone know how this compares to wuala's scheme?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: