Hacker News new | past | comments | ask | show | jobs | submit login
Dulwich: Pure Python implementation of git (samba.org)
151 points by DanielRibeiro on Aug 1, 2011 | hide | past | favorite | 33 comments



Github's hg-git plugin ( http://hg-git.github.com/ ) is built with this, and so manages to avoid a dependency on the git binaries.


Incidentally, the second most frequent contributor to hg-git after schacon is Augie Fackler, one of the Google engineers who helped with Google Code's git implementation (and also a frequent hg hacker).


A few months ago, I was at a talk at Google Chicago with two of the original creators of Subversion (Ben Collins-Sussman and Brian Fitzpatrick). After profuse apologies, they said everyone in the room should switch from git to hg, and that people who still used git just didn't realize how much it sucks. Then I realized why Google released hg support before git support :P


What didn't they like about Git?


At the time they evaluated hg and git, hg had better http protocol support.


And that's all?

Security, speed, all that's don't outweight http protocol support?


The http protocol was definitely the dealbreaker. That said, if you think hg is less secure than git you've bought into some FUD (the models are so similar it's almost silly, and the security quality is identical). The speed differences between hg and git aren't perceivable for the bulk of projects, even at sizes larger than most corporate repositories I've heard of. There are some operations that are faster for git (notably history rewriting), and others that are faster for hg (notably blame and per-file log). The two systems are very similar and just make some slightly different tradeoffs.


As far as I can tell, all changesets in Git are summed by SHA-1. The sum is also an ID for the changeset. You cannot change a changeset without modifying its' SHA-1 sum. This design make Git secure from tampering.

The ID for Hg changesets are some 48-bit numbers, like fb43b575b296. I do not think that this size is safe enough.


Mercurial prints the first 12 bytes of the hexlified sha1 by default, but everything is recorded using the full sha1, and can be referenced as such. You can view the full sha1 in a number of ways, the easiest would be "hg log --limit 1 --debug".


Mercurial uses full-length SHA1 sums internally, same as git. It just prints the first 48 characters for user convenience, unless you happen to have two objects that share that substring.


I'm also the actual maintainer of hg-git at this point, FWIW.


if hg-git uses dulwich, and it's all pure python, how come i see it running git-index-pack amongst other git (what i assume are shell) commands?


Huh?

Are you pushing to a local repository? When pushing to a local repo, dulwich calls a local git binary. I don't see any calls to git-index-pack in either hg-git or dulwich in a cursory grep, which matches my memory.


i'm pushing to github. reason i ask is that it segfaults -- see https://github.com/schacon/hg-git/issues/216


Something like that is not exactly core functionality. It could be a situation where hg-git runs purely in Python for everything that it 'needs' to do, and will optionally use the native commands for other git functionality.


This is the library that Google used for Google Code's git support.


Dulwich is a real town in the South of London, with a relatively large school called Dulwich College, where I studied. A bit spooked to see a library named after it...


"Dulwich is the place where Mr. and Mrs. Git live in one of the Monty Python sketches." (http://pypi.python.org/pypi/dulwich)


Not only that, there's a bus that goes to Dulwich Library, which I was obligated to snap a picture of when I was visiting London.


Does anyone know what the performance of this on windows (maybe with PyPy?) is compared to MSys Git or Cygwin.

If it's pretty good there should be some mileage in making a reasonable git client for windows based on this.


That's exactly what I was thinking of when seeing this. It could prove fruitful for getting a "Tortoise" interface for git that's both easy to maintain and doesn't rely on clunky bits like msysgit.


How about an Iron Python based implementation. Then it could be 'native' code


msysgit was fast enough for me in a 250+kloc project.

it was certainly faster than dog-slow g++.


What i would like to see is an Hg implementation in Go.


Why, exactly? It sounds like an oddly interesting idea to me, too, but I'm not sure why and I'd like to hear your reasons.


The recently-announced fork of the Plan 9 operating system, 9front, keeps it source in a Mercurial repository, and includes a Go compiler in the base operating system (which makes sense, since the creators of Plan 9 are also pretty much the creators of Go). I guess a Mercurial port to Go would make life a lot easier there.


Avoiding the overhead of python's startup time alone would be nice, hg is impressively fast for being written in Python, but could be much faster and memory-efficient in a compiled language where you have control over memory layout.

And that is without going into the potential of perhaps taking advantage of concurrency and parallelization for some things.


I'm actually considering doing this now you've said it. If there would be enough interest that is. There are very few "non-python" complete hg implementations.


I started tinkering with an implementation of revlog on the plane back from OSCON. If I get anywhere useful I'll post the sources somewhere.


Would be interested in that. Can you post back here if you do stick them somewhere.


I'll try my best to remember!


I don't know why this makes me so happy.


python is like china. By making a shameless copy for them self to use.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: