Unorthodocs: Abandon Your DVCS and Return to Sanity

aggieben · on Feb 28, 2015

I take some of his points, but this seems kinda...a couple years late here. It seems to me that Git/Mercurial just became the best centralized systems; being distributed fixed some critical faults with being centralized, and didn't really introduce that much more complexity, IMO. In most use-cases, you're talking about one extra porcelain command in comparison to a server-connected client like SVN or TFS, assuming you push every single changeset.

I also think being distributed encourages small chunks of work, where being centralized encourages the batcave approach, and that, I think, is a much bigger differentiator than most people realize. It impacts everything. Suddenly, merging doesn't seem as bad, and then suddenly being able to easily and cheaply create branches is more important. Centralized systems never really got there because the client/server nature works against them. I'm glad to see some big players giving them attention again, because one of the author's best points is that being centralized has certain advantages. It's just that the existing centralized systems suck.

I also think the GitHub workflow comparison to the email list + patch days is a bit contrived because the email list procedure was never that simple (well, for getting ignored maybe it was, but the point was to contribute, not be ignored). I, too, remember those days.

Steps 1&2 under GitHub aren't really harder than step 1 under CVS (or what have you). He just neglected to mention all the steps involving fiddling with CVS connection parameters, or finding the right URL for a SVN repository (trunk? All the branches? Do I care about tags? Good luck if they aren't using the standard layout).

Github #3 also belongs in the first list. Goodness...have you ever read the OpenBSD mailing list? It's about as friendly as EFnet on PMS.

I could go on, but the point is that the number of steps for CVS+mailing list isn't really different than with GitHub.

rlpb · on Feb 28, 2015

The reason git works so well is because it models real life. When you check out a tree to work on it, the moment you change a file in your editor you have forked the tree. Whether you save the changes or not, it's a fork, even if a temporary one. More than that, it's a distributed fork. What your undo buffer in your editor does, or your local filesystem does, or your VCS does, is merely an implementation detail.

It works well to be using a tool where the changes you make in your editor are unified with the more formal changes you'll be pushing back upstream. It makes the whole process smoother, and means that all editing involves an identical workflow.

Even with Subversion, for example: if you make a local change in a file concurrent to something being changed in the repository, then what you have is a diverging branch. Subversion will try to auto-merge when your commit fails and you update your working tree (IIRC). Just because Subversion neither calls your working tree a branch nor your update a merge doesn't stop them being so.

My point is that if you're changing code that is also being worked on elsewhere, you are using a DVCS whether you like it or not. Even if you aren't even using a specific DVCS tool. You have a DVCS the moment two people work on the same code base concurrently. So you might as well use a tool that integrates with the workflow that you already have.

DominikD · on Feb 28, 2015

This is not real life, this is a workflow, one of many. It happens to mirror mine though and, perhaps surprisingly, I can easily achieve the same workflow with centralized VCS by using branches. It would be a major pain in CVS but works just fine for me (and others) in Perforce. So it can be done.

What's interesting is what "it" means" and that you're proving one of the points author is making: your workflow doesn't rely on the fact that your VCS is decentralized. It relies on the fact that branching/forking is easy. Painful branching is quite often a matter of legacy since CVS and SVN were architected in different times when our needs were mostly different and technology was very different.

strictnein · on Feb 27, 2015

This may be strange to the author, but many of us experience Git completely outside of the realm of Github. Many of his complaints are fairly irrelevant then.

wallyhs · on Feb 27, 2015

The complaints as I read them are:

* Large repository size because of blobs, large histories, or many files

* Difficulty of using git

* Pull requests aren't easier than patch bombs

Only the last one has to do with Github.

tghw · on Feb 27, 2015

Maybe not GitHub, but likely you still use Git in a centralized model. Yes, it is technically distributed, and I know that people use it in a truly distributed manner, but if you're cutting builds or working with a sufficiently large team, it becomes very difficult not to consider at least one repo the "gold master".

pm24601 · on Feb 28, 2015

For me:

1. I do development offline all the freaking time. Basically I love the entire repo stored on my disk for the same reason I have a laptop instead of a desktop - so I can code anywhere.

2. News flash there are plenty of places that do not get good internet connectivity. Or my phone maybe dead or dying. Or I just stopped caring about having the tethering option on my plan. (and didn't care enough to bypass the restriction )

3. I don't use a lot of large images/video/etc in my development. Never have really, but then again I do a lot of server-side dev - very little Android coding.

4. Why do I care about a repo that is 1-2 gigabytes? I have a 4 terabyte disk.

Overall, comment: problem for facebook and google - for most companies this is not a problem. And if your company is having this problem, it has the money and people to solve the problem ... like facebook and google are doing.

philtar · on Feb 28, 2015

The question is not when do you do development offline? The question is when have you ever needed the ENTIRE history of the repository offline?

Have you? Maybe. Do you constantly need to? Very very unlikely for the vast majority of people

pm24601 · on March 1, 2015

Yes I do need the entire repo:

1) when a developer "accidentally" deleted the master repo.

2) when a developer accidentally damaged the master repo with a force push and it was easier to drop and restore the repo from my disk version.

3) when I am figuring out the overall contribution that each developer made.

4) when i am running a git bisect to figure out when a bug was introduced.

5) ... other cases.

As another person said, its not that I always need the full repo its just that I need it on occasion and I don't know when I will need the full repo. And those times when I need the full repo, I may be flying overseas or at my mom's house ( and she doesn't have internet ).

By the simple expedience of having the full repo with me, I don't have to worry about needing an older version of a file ... or anything.

twic · on March 1, 2015

I use blame quite a bit, along with the pickaxe in Git and revsets in Mercurial. Those involve querying over the history of the repository - potentially quite a large fraction of it. I'm glad to be able to do that locally, limited by the speed of my SSD rather than min(the latency of my broadband connection, my tiny slice of GitHub's disk bandwidth).

falcolas · on Feb 28, 2015

> Do you constantly need to?

"Constantly" doesn't matter. If you ever need access to the entire history while offline and it's not there, you can't work. Disks and bandwidth are both relatively cheap, compared against the value of having access to the data.

liveoneggs · on Feb 28, 2015

a 2GB git repo (without any binaries) probably has 4-5 million objects and will take a few GB of memory to clone and will otherwise be slow for status and other commands.

pm24601 · on March 1, 2015

Sure. So what? Most repos I am dealing with are less than 200m. The problem of 1-2 gigabytes with millions of objects is simply not a big problem for most startups companies. Larger companies have the resources to solve this "problem".

liveoneggs · on March 2, 2015

you asked.. > 4. Why do I care about a repo that is 1-2 gigabytes? I have a 4 terabyte disk.

pm24601 · on March 1, 2015

also if this was the case I would make a second git repo a filtered version of the complete git repo. Sure it is a "hack" but it is an engineering solution to an engineering problem. Not all engineering solutions need to be idealized perfection.

_wiv7 · on Feb 27, 2015

I find it fantastically useful to have the entire revision history on local disk, it makes it practical to actually do ridiculous searches through all of the history. Try doing that against your svn server.

gecko · on Feb 27, 2015

You're simply defining weaknesses of Subversion, not centralized versus distributed in either direction. E.g., we made blame for Git repos in Kiln ridiculously fast by caching memoized states for each file. You could also do that locally; we happened to do it server-side because it made more sense. There's no reason that an outright centralized system couldn't do that (and indeed, some do, though neither CVS nor Subversion).

pm24601 · on March 1, 2015

can you expand on this? How did you do that? do you have a script?

DominikD · on Feb 28, 2015

At the same time you're not grabbing a local copy of The Internetz to search them and your searches are sufficiently fast. This is not a problem of centralized (from the point of view of the user of course) VCS but quality of a particular implementation.

kstenerud · on Feb 27, 2015

You don't need your own copy of the universe in order to search like that. You only need the capability built into the source-of-all-truth.

to3m · on Feb 28, 2015

One point that isn't addressed is the difficulty of actually working with large numbers of unmergeable binary files, even assuming you've got the disk space to store them. Unless you've got some kind of centralized lock/unlock (check in/check out, etc.) functionality, serializing access to files is going to prove difficult.

I've seen it suggested that you should have some better means of coordination than what's in effect a per-file mutex. It's true you need to have a rough idea at a higher level what's going on (no point deciding ten people will all work on the same thing, when that means they'll all have to edit the same file!), but day to day, working at the file level, you still need the mutex to ensure people don't step on one another's work. It's a simple mechanism, and it scales about as well this sort of thing can.

robaato · on March 2, 2015

There is value in communicating status of work via a central SCM repository. For binary files and locking this is particularly so.

The approach of Perforce where file status is tracked on the server provides this. There can be costs associated so it's not just a panacea for all SCM ills, but it's certainly worth considering.

One reason Perforce is so widely used in game development - it scales easily to Terabyte sized repositories straight forwardly.

mbleigh · on March 1, 2015

Agree with most of the points in terms of most developers aren't using the D enough to make DVCS worth it. However, have to disagree about open source development before and after GitHub.

Ironically, it's the centralized user accounts on GitHub that made it really outstanding for open source. Now I don't have 100 different systems each with their own logins and conventions, I just have GitHub and all the myriad projects thereon.

Pull requests are better than patches because they are more explorable (quickly view diffs right from the browser), discussable (make comments on specific lines of code, mention other users), and programmable (webhooks to run tests against pull requests instead of manually pulling and running tests). Those are pretty big advantages.

chaz72 · on Feb 27, 2015

I have been wondering whether to go back to Subversion myself. The distributed option really doesn't apply to me, my minimal branching needs are met by Subversion, and oh my god, git's submodules get confusing.

gecko · on Feb 27, 2015

(Author here)

To be clear, in real life, I do not actually like Subversion. I use Mercurial pretty exclusively for my own stuff, and would indeed use some of the centralized Mercurial extensions I linked in the article (e.g., remotefilelog, narrowhg, etc.) to scale upward if I had really big stuff flying around. The article is more about pointing out that going to a DVCS involves trade-offs, acknowledging that we have a lot of tooling designed to fake out those trade-offs, and discourage thinking of DVCS as a strict upgrade rather than an engineering decision with implications and costs and benefits.

chaz72 · on Feb 28, 2015

Oh, no doubt Subversion has its downsides and Mercurial is really good at what it does. I was just musing through my own tradeoffs. It is a very narrow point but one that I hope is in the spirit of the article.

Partial repo checkouts, SVN externals are the two areas where I feel I lost something when "upgrading" to Git. So yes, I quite agree, there are definitely tradeoffs.

liveoneggs · on Feb 28, 2015

why are you using submodules if your needs are simple?

chaz72 · on Feb 28, 2015

Dependencies from other repos. Is there a better way that's the equivalent of a Subversion external?

Mathiasdm · on March 2, 2015

One alternative to consider is Mercurial (which command-line wise is more similar to Subversion) with subrepos (which have better usability): http://www.selenic.com/hg/help/subrepos

chaz72 · on March 3, 2015

Thanks, I will look into this.

liveoneggs · on March 1, 2015

I would just import them directly like a vendor branch.

http://marc.info/?l=git&m=119495815518898&w=2

It works like this: (I just did this test)

#inside of your repo

git remote add vendor $vendor.git

git fetch vendor #grab their code

git merge vendor/master #imports their code into your repo

git mv vendorlib lib #move it where you want

git fetch vendor #grab the next version

git merge vendor/master #git remembers your mv from before

*update

of course since this is git there is yet-another-command called git read-tree to do a lot of this.

It's all described here: http://git-scm.com/book/en/v1/Git-Tools-Subtree-Merging