Ship Small Diffs

Groxx · on Feb 10, 2017

Ehh... nice in principle (and I do small deploys all the time for work), but too many artificially-small changes can easily cause you to "miss the forest for the trees". Each change is small and LGTM-able, but they can add up to a misbehaving system unless you have full context (which, because they're small, does not exist in the diffs).

If it's conceptually a single unit, keep it a single unit. Pushing dead code in small pieces that waits for a small master switch doesn't give you additional safety (though it can give you easier merges).

>Reviews for large diffs are closed with a single “lgtm,” or miss big-picture problems for the weeds.

That means you have bad reviews. If it can't be figured out, how was it written? Take the time and do it right. Yes, it's a large commitment - is it larger than the value being created? If no, why does it exist?

rtpg · on Feb 10, 2017

The problem is that review complexity is quadratic in size, and quadratic in number of interacting parts.

Each thing can interact with each other thing (in theory), so if you have double the diff, you might need 4 times the amount of work. And the increased workload increases the risk of a mistake.

But splitting up the diff in two doesn't change the result, right? The reality is that everything doesn't interact with everything else. But the reviewer doesn't know this, and needs to confirm non-interaction (hence a review).

If you, as an implementer, know the divisions, then shipping the pieces that are independent will do two things. First, it makes the review easier, letting it serve its main purpose: catching errors in implementation. Secondly, it will let _you_ confirm you know the divisions. It could be that your conception is wrong! Your small diff might not work because of an interaction you didn't realize.

Personally, I think that by the time you're sending diffs, the general architecture should have been decided. We tend to have at least one person (not the implementor) who knows how a feature is going to be implemented, so the architecture can be confirmed. You want someone to be looking at the forest! I think that doing this during code review is a bit harder though.

Taking the time to do it right is hard, and chopping things up makes it easier to do "it" (implementation review) right. Architecture review should probably be happening in a different space.

On a social level, you are asking for your code to be reviewed. Please have empathy for the reviewer.

Groxx · on Feb 10, 2017

>Secondly, it will let _you_ confirm you know the divisions.

Absolutely. Which is why I draw the line at "conceptual units". If there are rational sub-divisions that don't require context, go ahead, divide! If you're baking in cleanup + feature + bugfix + convert tabs to spaces, you're mixing things in the same way that you would usually avoid when coding. Same thing with too-large commits - we avoid 10,000 line files and functions, do the same with commits.

I caution only against artificially small diffs. Small is a fine target, but there's a lot of dogma around it everywhere I've been, which is why I think it's worth calling out. Breaking things up too much can remove context, which can be dangerous.

---

If you're introducing a large thing in 1 vs N units (say a full feature in one go, which is unusable until complete), the line is of course fuzzier. In some ways, sub-units may be more reviewable - that's a solid benefit, though they're not likely a dozen or two lines of code (if they are, and it's a large feature, you probably now have dozens of reviews). But you aren't getting any additional safety, because it's all either on or off - which is half of the post.

For the other half of the post, if you're not getting the same reviewer(s) for all the pieces, unless you've had a detailed architectural review[1], has anybody but you read and understood the whole system? When it's turned on and it breaks in a subtly-interacting way and you're on vacation, does someone else know where to look? Doing it all in one go, as it will be when turned on, forces an architectural review of the system implemented, not just the plans. Say it takes a day or two to thoroughly review - is that bad?

---

Always tradeoffs. For the situation described in the blog post, where large reviews get insufficient attention, I don't think small diffs will solve the problem. The problem is the lack of attention given to reviews - fix that. It's critical regardless of diff sizes.

[1]: No place I've worked for has truly done this, which I would claim is the (vast) majority, but I wholly believe some places do. I would probably even like it! But it's perceived as slowing things down, so it hasn't happened for me yet.

mcfunley · on Feb 10, 2017

Great notes. Thanks for writing this!

Architecture review distinct from deploys is key, as is a separate operability review. And it's also important to mentally decouple the feature release to end users from the code releases.

alkonaut · on Feb 10, 2017

Review of 10 lines: 10 issues. Review 1000 lines: LGTM. As the article says - even the best strongest teams and culture will fall in that trap.

Avoiding review fatigue has to be done some other way than merging smaller diffs, for example splitting changesets among reviewers so that person A reviews the backend code and person B reviews the frontend code etc. A meaningful feature of 1000 lines can still be reviewed in terms of 20 50-line changes, and doesn't have to be reviewed in one sitting or by one person.

You are perfectly right that pushing dead code and half baked features (that won't be used, so aren't actually in production!) is useless and adds no safety. Adding a new feature is risky, and adding it as 99% dead code and then finally adding the 1% code with the button that actually enables the feature in the UI doesn't help - 100% of the new code will hit production with that final change.

_ph_ · on Feb 10, 2017

I read such articles more like a general guideline than a strict requirement. Of course it is bad to split up commits, which are one unit, into several parts. But when in doubt, you should try to commit in smaller batches. And by all means try not to commit unrelated changes in one commit.

notacoward · on Feb 9, 2017

Um, no. Keeping diff sizes smaller is nice, but asking for "a few dozen lines" most of the time is too doctrinaire. In many codebases, especially those that are older and larger, even a fairly straightforward enhancement can require a few dozen lines of new code plus even more modifying callers or hooking things together in other ways. Oops, already over the limit. Breaking up patches can even make them less coherent as context for each one is lost, and can slow things down if all the pieces have to be pushed separately through a slow CI pipeline.

Keeping patch sizes to a minimum is good, but it's a minimum of what's feasible based on other concerns. If "few dozen lines" works for you that's great, but don't overgeneralize from that.

olau · on Feb 9, 2017

I think the point was to keep your changes minimal, not to break the patches up arbitrarily. So if a new feature consists of X and Y, you try to release first X, then Y, not both at the same time, even if they are related.

MaulingMonkey · on Feb 10, 2017

> Breaking up patches can even make them less coherent as context for each one is lost

A potential solution: Non-fast-forward merge commits. Best of both worlds: Small diffs and large diffs. (I'll also note that I haven't found this to be a problem in practice.)

> and can slow things down if all the pieces have to be pushed separately through a slow CI pipeline.

And can speed things up if the CI pipeline can pinpoint the exact change that broke the build. Fat commits just obscure the underlying problem - although sometimes that's the best you can hope for.

Groxx · on Feb 10, 2017

I like the merging pattern in principle for that reason. In practice I've had too many tooling issues (git bisect being a PITA to use due to infinite decisions, history-browsing difficulty, people getting confused about which commit to revert and how).

None of which are intrinsic. And maybe they've been fixed since my last attempt? But noticeably painful.

w0rd-driven · on Feb 9, 2017

I think this is what I really took from that. I feel like very small diffs that lack overall context just add cognitive load. If the context isn't properly captured, the tendency to green light a few changes would be higher when these changes could have significant impact on other parts of the system.

I do feel like this becomes almost an artform. Small but concise PRs aren't that easy in some situations and smaller PRs that address large refactoring in pieces seem to win over one large PR that is difficult to parse mentally. What that looks like varies by project or organization and I feel like one size definitely doesn't fit all.

OJFord · on Feb 9, 2017

Agreed - arbitrarily splitting into two commits just because otherwise you'd break the magic 'few dozen lines' limit is just making git management harder: rebasing? reverting? checking-out? Are you sure you included all the commits that make up the single logical 'commit' you're looking to rebase on/revert/checkout?

davidkellis · on Feb 9, 2017

This strikes a chord with me, but I'm coming at it from a different angle.

Does anyone else encounter very large diffs as standard operating procedure in golang codebases? Here I'm considering "very large" to mean 500+ lines changed - however github computes lines changed.

I mean, I'm looking at this from the perspective of someone who contributes primarily to Ruby codebases. It's understandable that there would be a difference when comparing ruby/go, but the magnitude of the difference is what I'm getting at.

The difference I'm observing in the average size of a PR in a plain old Rails codebase compared to the average size of a PR in a golang codebase is dramatic - roughly an order of magnitude. In some cases it's even more extreme.

Is that anyone else's observation, or am I just doing it wrong?

kornish · on Feb 9, 2017

This definitely mirrors my experience on our Go codebase. Getting nontrivial features out the door generally requires hundreds of lines of code at a minimum, and sometimes thousands. We've taken to branching off of branches to keep code review moving quickly, but that creates new problems like cascading rebases all the way back down.

If there's some secret to architecture or tooling (e.g. feature flags) that people use to keep PR size down in Go codebases, would love to hear it.

Haven't worked in an enterprise Java setting before but would be curious to compare diff size there.

nradov · on Feb 10, 2017

I work in an enterprise Java setting but I don't believe that this is a language specific issue. Our diff sizes vary by 3 orders of magnitude, so it would be meaningless to give an "average" line count. Each one is just large enough for a complete user story or defect fix, no more and no less. User stories have to be defined and broken down based on business value delivered to the customer regardless of how large or small the change is from an engineering standpoint.

hyperpape · on Feb 10, 2017

I'm in the same boat, but I feel like the problem is tying the commits to the user stories. Ideally you'd be able to make multiple commits to implement a single user story where it can't be done with a small amount of code.

nradov · on Feb 10, 2017

Sure there are multiple commits on a branch, but when we merge back to the trunk the whole user story has to go in one shot. We can't put a partial user story on the trunk because then it wouldn't be in a fit state to release.

hyperpape · on Feb 13, 2017

Well, we can't commit half a visible feature, for sure, but my sense is that the parts that are too complicated to review in a single setting often either involve a preparatory refactoring or represent a code base that has excessive duplication.

There are things that just can't be split apart, I admit. I had to do a truly heinous refactoring of some code awhile back to migrate from floats to doubles that couldn't be accomplished via a global find and replace (unsafe casts and other crap). That was the largest commit I've ever landed, and the code wouldn't compile for half the time I worked in that branch. Come to think of it, that refactoring was harder than it had to be because of duplication and badly written code, but it would've been massive no matter what.

tonyhb · on Feb 9, 2017

The expressiveness of a language directly impacts the amount of code you need to write for a feature.

Go is a concise language but it's not expressive - you'll write 4 lines to call a func and check an error.

This means that PRs are inevitably going to be larger than a language like Ruby or other languages with metaprogramming.

Vendan · on Feb 9, 2017

I think you have those backwards. Concise is expressing an idea in less words, expressiveness is the breadth of ideas that can be communicated. Go is not concise and does not try to be.

Groxx · on Feb 10, 2017

Just to play devil's advocate (in case this is what they were thinking), go's language definition is fairly concise, by design. Few keywords, little magic, leading to lower expressiveness per LoC.

Vendan · on Feb 10, 2017

When talking about lines of code written, "concise" is roughly "idea/length" whereas "expressiveness" is "sum of ideas", so yes, "concise" could be roughly seen as "expressiveness / LoC", but it's still the wrong word to use, as "expressiveness" specifically refers to the "breadth of ideas that can be represented and communicated". And either way, while the language spec is concise, the language itself is most definitely not

tonyhb · on Feb 9, 2017

+1 totally, thanks for the heads up!

MaulingMonkey · on Feb 10, 2017

Lots of factors here. I find it more a matter of project style/age than language.

Adding whole new chunks of code, refactoring the public interface of widely used systems, etc. tend to result in large diffs - lots of 'churn'. Extending existing code, bugfixes, more localized refactoring and doc fixes - more 'stable' codebases - tend to result in smaller diffs.

Also some difference between e.g. git and perforce - branching is easier, so I can make smaller commits without worrying about breaking the dev branch...

xyzzy_plugh · on Feb 9, 2017

Anecdotally, I agree it seems that Go codebases have considerable churn compared to other languages.

That being said, the churn is typically much easier to review and understand, and thus have confidence in, especially compared to Ruby or Python.

yason · on Feb 10, 2017

Paraphrasing "as simple as possible but no simpler than that": ship as small diffs as you can but no smaller than that which makes sense.

Individual commits don't necessarily make sense in the whole. Submit diffs which make up a logical, conceptual block of new changes. Don't mix formatting changes, whitespace cleanups, or rearranging changes into diffs with real meat. Ship new prerequisite tools and new changes to utility functions or libraries first, then the meat: i.e. split general codebase upgrades and distinct features into different shipments.

Make it so that "git log" will produce a list of sensible steps if someone else reads the listing. Remember, 90% of programming is writing good code and the other 90% of programming is communication to others.

toddsiegel · on Feb 9, 2017

Generally, I like the middle ground of reasonably small branch sizes and small, well-organized, well-written commits.

Walking through a half dozen or so commits in a PR is still manageable.

Too many small things deployed over time can be a pain to roll back too.

On the flip side, huge change-the-world deploys are bad too.

It's a balancing act of managing the initial feature size, getting it out, layering on more in subsequent releases.

Where tiny deploys work really well is when deploying refactorings, little housekeeping chores like upgrading libs. If, while working on a feature or bug, I need to add a missing test, refactor, or upgrade a lib, I extract those commits out and deploy them ahead of time when possible. That way the resulting feature branch is very focused on the feature at hand.

FLGMwt · on Feb 10, 2017

I think I still prefer the logical diffs to be at a PR level. I know at least personally that I commit a lot of experimental code that I back out of for the eventual implementation.

Doing git fu to reverse that is not worth the time imho. I'd rather invest time in factoring (and testing) tiny deliverables.

At least then, you still have the diff as a travelogue.

iamleppert · on Feb 10, 2017

This is nonsense. The number of lines of change of code has absolutely nothing to do with the amount of risk.

A single line change can easily break an entire system. Applying this size fallacy, however is just as dangerous. Certain classes of changes should be grouped together so they can be reviewed with the context required.

I've had the unfortunate task of working in environments where I've been forced to artificially break up my PR's to suit some arbitrary rule, no doubt because an article like this got read at some point.

What ends up happening is instead of having everything related to a specific changeset or feature in a single PR/commit, you now have it strewn across multiple PR's/commits, non-nonsensically, and with artificial borders most of the time. This makes review more difficult and creates additional work to support the partial implementation working at each slice.

Process should not replace thinking. If a feature warrants it, who cares if the PR is 2,000 lines or 12. Not every task is small and not every task is large. As long as the scope is well defined, many times it does make sense to do tasks in whole rather than as a sum of parts.

seanwilson · on Feb 10, 2017

> Certain classes of changes should be grouped together so they can be reviewed with the context required.

Agree with this. I've worked in teams before where the order of tasks to be done was chosen by someone not on the development team and I found it seriously inefficient. When you're reviewing code, you have to think "what could this break?". When you group related changes together the surface area of things that can break is lower so you can more efficiently check for issues.

pavel_lishin · on Feb 9, 2017

> You don’t need elaborate Git release rituals. Ceremony such as tagging releases gets to feel like a waste of time once you are releasing many times per day.

What happens when you ship bugged code and need to roll back?

mcfunley · on Feb 9, 2017

The general advice is to roll forward, i.e. remediate the broken thing and deploy. Ideally, the broken deploy was ramping up a config flag and the fix is to turn it back off. (But that's another post.)

Rolling back mechanically is sometimes safe but not always safe, because you have data.

charleslmunger · on Feb 10, 2017

In a large system, every change that comes with stored data needs to have a plan for rolling back - usually it's as simple as "ignore but save" for a new column in the DB, or "write to both, read from old" for a backend migration.

scott_karana · on Feb 9, 2017

`git revert` is perfectly suited for this, since it makes a new "inverse" commit of the changes.

Your production deploys continue monotonically into the future as usual, and you now have version-controlled documentation of the rollback, instead of needing to maintain a separate mapping of production <-> code state.

byroot · on Feb 10, 2017

Having an actual rollback mechanism that brings you back to a known working version is usually way faster than pushing a new version, since it needs to go through CI / Build etc again.

At my company rolling back is a matter of a minute or 2, while pushing a revert is more in the 5 to 10 minutes. Your mileage can definitely vary though.

However it requires all the shipped changes to be rollback compatible, which isn't that hard with a bit of experience.

pavel_lishin · on Feb 10, 2017

Doesn't this assume you only ever deploy one commit at a time?

scott_karana · on Feb 10, 2017

Cherry picks and merges are single commits... and fit well for a lot of dev->staging->release branch workflows.

flukus · on Feb 10, 2017

It's always felt like a waste of time time me, even with SVN. You always know what the released version is and can branch from there if necessary.

lolsal · on Feb 10, 2017

In larger organizations it is very easy to not have any idea what version is currently deployed. I worked for a company that cut SaaS releases every two weeks, but for various reasons the deployed version might be 1-4 months behind current repo HEAD in engineering. It got even more cumbersome deploying different versions for US vs EU and GOV environments.

It's definitely not ideal, but I'd bet it happens more often than we'd like to admit as professionals.

Another thought - at that same company there were I think 5 or 6 different maintained enterprise mobile apps. Jumping between repos and dealing with the various app stores/approval processes it was easy to get lost regarding what was in production.

flukus · on Feb 10, 2017

In everything I've worked on where we cared about the version it has been baked into the binary in an about dialog or info page somewhere, so they only way to not be able to tell which version you were using was if you didn't have access to the app.

This of course relies on having the automated builds set the version numbers properly, it breaks down if it's a manual step, which I'm all too familiar with.

richardwhiuk · on Feb 10, 2017

Just use the commit id, instead of tagging it explicitly

pavel_lishin · on Feb 10, 2017

Doesn't that assume that you only ever have one commit per release? Otherwise, you have to keep a track of "release commits".

byroot · on Feb 10, 2017

It's better to have an activity log of your deployments, that way by querying what commit is currently in production you can compare it with the log and know what actually commit range was deployed.

There is many tools to do that, e.g.

  - Shopify's Shipit: https://github.com/shopify/shipit-engine (author here)
  - Zendesk's Samson: https://github.com/zendesk/samson
  -  Netflix's Spinnaker: http://techblog.netflix.com/2015/11/global-continuous-delivery-with.html
  - Yahoo's Screwdriver: https://yahooeng.tumblr.com/post/155765242061/open-sourcing-screwdriver-yahoos-continuous
  - Instagram's Sauron: https://engineering.instagram.com/continuous-deployment-at-instagram-1e18548f01d1#.tv6zycskx

And the list goes on.

avenoir · on Feb 9, 2017

We're pushing fairly large changes once a week from multiple team members. What has really helped is having comprehensive unit, integration and end-to-end tests. We run first 2 on every check-in into main branch. Then run all 3 after moving from QA to Staging. And finally do smoke tests in Staging. The only problem with this approach is of course spending ungodly amount of time writing tests and infrastructure/tooling to support automated tests especially end-to-end. But this hasn't failed once yet.

carterehsmith · on Feb 10, 2017

>> Building a web application is a young and poorly-understood activity.

Sorry, no. People have been shipping, and understanding how to ship web applications for... what, 10, 15, 20 years now?

mcfunley · on Feb 10, 2017

Struggling with "how long have web applications existed" doesn't help your case that they are a solved problem.

carterehsmith · on Feb 10, 2017

Not sure what your point is.

There is like 10 million people around that shipped a web app.

mcfunley · on Feb 10, 2017

Ok, I'll give you a serious answer. For an individual shipping a website, anything at all works. I'd guess that half of all websites that have existed are just a single person editing files directly on a webserver without source control, and that's fine.

It's another matter altogether to ship code with tens of thousands of requests per second and a lot of teammates trying to change it simultaneously.

carterehsmith · on Feb 11, 2017

>> ship code with tens of thousands of requests per second

Define "request"

If the "request" is to deliver some static file, that has been solved many years ago with CDNs and server caches and so on.

If the "request" is to do something other than static files, it has nothing to do with Web. It could be request to your bank's COBOL app or a request to their Java app or to their Node.js app. Nothing to do with Web.

grigjd3 · on Feb 10, 2017

Compared to the history of agriculture, 20 years is nothing.

_ei8q · on Feb 10, 2017

What about person-years?

alexbecker · on Feb 10, 2017

I find it unimaginable that more people are coding than are farming full-time (which is at least hundreds of millions). Given that agriculture is also older by >2 orders of magnitude, I think it's pretty safe to say it has more person-years behind it.

mcfunley · on Feb 10, 2017

A lot of time has gone into farming in the last ten or twelve thousand years (source: nautil.us, ted.com)

farnsworth · on Feb 10, 2017

And how many different tech stacks have we been through in that time? I build web apps using very different tools than I used 4 years ago. I read that sentence as comparing web dev on a timeframe to, say, civil engineering.

carterehsmith · on Feb 10, 2017

Yeah I can see how that is problematic.

What I do is -- look at what very big players are doing (Salesforce, Facebook, etc) and follow that.

Also, look at what all random minor players are doing, and absolutely ignore that.

cortesoft · on Feb 10, 2017

I have worked at both large companies and small startups, and 'doing what the very big players are doing' is not always the best thing to do. There are just fundamentally different challenges depending on your size.

aytekin · on Feb 9, 2017

Continuous Deployment is scary at first but once you have good test coverage it is safer than big releases.

If a developer is creating one bug per day. Is it better to release it every day or is it better to wait 1 month and release 30 bugs all at once?

Continuous improvements are also highly motivational. The difference is between saying "We can fix that at the next release" and losing the motivation to fix it later and saying "Just shipped the fix" and feeling great.

melling · on Feb 9, 2017

Release early, release often?

https://en.wikipedia.org/wiki/Release_early,_release_often

What's old is new again.

Animats · on Feb 9, 2017

Birth defects are forever. If the design has a problem, "small diffs" won't fix it. That's the trouble with the "continuous hacking and integration" approach.

FLGMwt · on Feb 10, 2017

I think that's a bit too fatalist. I'd hate to have an org fall into the trap of "oh, well I guess since we didn't plan for changes since the beginning, we're stuck with 21 point stories and appropriately long 4 week sprints". I want to hope that the outcome of this post is "hey, I wonder if I can ship smaller parts of this to make the whole less risky"

mden · on Feb 9, 2017

This is something I've had a lot of problems with. I understand the benefit of small diffs, but I can't seem to find a way to really apply this for any actually meaningful changes. In my experience I would spend some time coming up with a system of multiple abstractions that are interdependent and then getting single logical commit for proof of concept would require hundreds of lines. Splitting the abstractions into separate commits would be confusing so I don't think that's good. Maybe I should stub functions a lot more aggressively and truly aim for minimal. The other problem is when you are replacing an abstraction in an already large code base, the raw number of changes required could easily be in the hundreds or even thousands of lines.

tcoff91 · on Feb 10, 2017

I have had the same experience. I work on a number of large C++ codebases at work and often times I just can't find a way to break most things up into small diffs.

dorianm · on Feb 9, 2017

> PR’s of GitHub branches are great. But git diff | gist -optdiff also works reasonably if we are talking about a dozen lines of code.

So, he just sends a gist link to his teammates? why not create a small pull request and then asking them to look over it?

forgotmypw · on Feb 10, 2017

>I regret to inform you that your code has to be deployed

haha

pklausler · on Feb 9, 2017

A better goal might be to "design interfaces and implementations in such a way as to tend to reduce the sizes of diffs for likely future changes".

milesrout · on Feb 10, 2017

Continuous deployment is unprofessional.

nradov · on Feb 10, 2017

Continuous deployment requires the highest level of professionalism and engineering process discipline.

milesrout · on Feb 11, 2017

It's unprofessional. Anyone that is deploying things into a production environment without extensive careful testing and quality assurance is either selling something so dreadfully irrelevant that nobody cares if it's full of bugs and issues, or is putting others lives, jobs or wellbeings in danger.

nradov · on Feb 11, 2017

Wrong. Extensive automated careful testing and quality control can be built into continuous deployment pipelines. (Quality assurance for defect prevention has nothing to do with continuous deployment one way or the other. Many people in the software industry confuse QA with QC.)

milesrout · on Feb 12, 2017

Again, no, you are wrong. You cannot do continuous deployment and quality assurance. The entire point of continuous deployment (or delivery or whatever they call it now) is that you are deploying constantly from master.

nradov · on Feb 12, 2017

Again, no, you are wrong. You can do continuous deployment and quality control. All of the tests necessary to qualify a particular build for release can be automated and included in the delivery pipeline. If you disagree then please provide a specific example of a test which can't be automated that way.

Quality assurance (defect prevention) activities are performed before code ever gets to the master branch.

FLGMwt · on Feb 10, 2017

Context? Evidence?