I’ve seen “x% time for tech debt“ rules in several companies and sadly it didn’t work too well.
Since the problem was the culture of continually pushing half baked features in the first place, the rule was quickly corrupted: people would design a good system, throw anything that’s not required for a POC into the tech debt backlog and deliver a barely functioning version.
“This is a technical debt task” was used to prevent everything that wasnt new Features taking time of the other 90% of the sprint.
Basically, if you assign a block of time to quality, you risk people taking that as an excuse to not focus on quality outside that block.
I think the key observation here is if the team doesn’t care about quality, you can’t create it out of thin air.
If the team does care about quality (as described in the OP), then something like a tech debt budget/carveout can be a good management/scheduling strategy to buy the team breathing room from the rest of the org.
I have used this strategy successfully in the past. For some reason it’s often easier to “spend 10% of time on tech debt” than “spend 10% of time polishing your code to avoid tech debt in the first place”. I don’t even think the latter is the correct way to build software, as you seldom know ahead of time what will justify continuous polish and refinement.
The advantage of discretionary tech debt fix time is it lets you gradually refine rough edges as they become pain points, with a low-friction bottom-up process (ie the developers that see the rough edge are empowered to fix it, rather than requiring PM/scheduling overhead for everything).
I've also seen this sprint-task-fake-completed later cleanup misattributed to "refactoring", which I've started calling "fudge-factoring".
IMHO, at the point you have engineers misrepresenting that something is done, even if you're not in an application domain in which engineers go to jail for that, you take a step back and fix your engineering/product culture.
That might include nuking your current methodology from orbit, just to reinforce that organization needs to learn a new way of thinking. (Not just "don't do that" on one symptom, when whatever culture weakness that caused the fudging is still in play.)
I've watched a really elite group of engineers fall apart slowly because one leader decided to start committing systematic fraud and coaching his delegates to commit fraud.
It's well done, because it's almost impossible to prove.
Those of us who still care about system health and helping users just wait two years after they launch something, ask leadership for permission to delete it, and keep trying to build abstractions that reduce the harm misconduct has on our parts of the system.
The problem is they keep getting promoted because no one notices that almost nothing they've built is still in production.
A few of us have tried raising the issue to leadership but no one in leadership wants to know. It's a bit spooky. My current theory is that no one has the power to deal with it anymore, and at higher levels, launching garbage features that get deleted is actually desirable.
Theory: senior leadership keeps saying "drop what you're doing and build X". This group does that, but in a way that's half baked enough that we can kill it later. Middle management benefits because senior leadership is happy, and deleting it is fine because leadership forgot about it.
If organizational goals are centered about building X with half-baked features and then kill it after the perf review cycle, it is rational for everyone to focus on that. Stop wasting time on fixing tech debt. There is always a main product that pays all salaries and bonuses. As organization gets larger, fiefdoms get built by hiring more people. Just understand what’s going on and use that “knowledge” to advance careers and to make more money.
It's a signal of politics, too: at companies where the people who view that kind of hoop jumping as an important signal got their way, it means they won that political battle.
I don't get what you mean. Couldn't you equally claim that in companies that avoid leetcode interviews the anti-leetcode people won that political battle. I don't see why one option is more political than the other.
I’m sure the topic of leetcode has been beaten to death on a forum such as this, but serious question: with ChatGPT, why WOULDNT I cheat, assuming the leetcode is the unmonitored/unproctored variety (most are this way)? Like seriously. We know the leetcode skill isn’t applicable to the actual duties and responsibilities of the job itself, so why not cheat, especially when ChatGPT makes it so easy?
I think it’s less an engineering culture issue than a management one. Do you reward people based on number of features, meeting rapid deadlines, quality, or something else like customer satisfaction?
People respond to an organizations actual priorities not whatever platitudes people toss out. If you want quality you need to sacrifice something even if
long term results improve in the short term quality takes time.
Exactly. I recall one place where an exec pushed for something to be completed in a too-short timeframe. It launched on the schedule he announced to his higher-ups, so he was promoted and moved elsewhere. Woe be to the exec who inherited the project, though, as the code was basically a mound of turds wrapped in Christmas paper. And the team, of course, was fractured and burnt out. The cherry on top was watching the ladder-climber throw a lunch to thank the team; their reward for hundreds of hours of grueling overtime was lukewarm Chinese food dished out of large aluminum bins in a conference room. Plus $50 gift cards!
This was a long time ago, but I still think about it as a moment when I learned what POSIWID really meant.
Yep. Or in other words, you (usually) can’t solve cultural problems with technical solutions. If your team culture doesn’t value long term thinking, it’ll take more than a schedule change to introduce that mentality.
In the story in this post, one important thing that happened was that the team (with management) came together to acknowledge the tech debt problem, acknowledge they want to solve it and talk through potential solutions. This moves the problem from something individuals care about to something the team as a whole acknowledges it care about (a shift from individual knowledge to common knowledge). Once everyone agrees this is a problem, individual engineers will know they’re acting against the will of the tribe by writing lazy pull requests. And being reprimanded by their coworkers for doing so will have much more weight.
> Or in other words, you (usually) can’t solve cultural problems with technical solutions.
Yes, though technical solutions go a long way towards making cultural solutions work and stick.
Eg running your tests automatically when someone makes a Pull Request is much more robust, even in a cultural sense, than asking developers to manually run the tests.
Technical solutions can help make the culturally favoured approach also be the path-of-least resistance.
Of course, you still have to write the tests. (Though even there, you can make your technical tools check for eg test coverage. That can be gamed, but that's more effort than when you can just ignore it; and even gamed coverage is better than no coverage.)
True; but you're going to get much better testing if your team collectively cares about it than if it feels externally mandated by that one annoying guy who set up your commit rules.
Technical solutions are the best solutions when they work. But being technologists will bias us towards looking for technological ways to solve problems. Its not always the best approach. A few hours chatting over lunch is usually a much more effective way to change how your team works.
> True; but you're going to get much better testing if your team collectively cares about it than if it feels externally mandated by that one annoying guy who set up your commit rules.
Some of the worst tests I've ever seen were in places where managers measured test coverage. Reams and reams of absolutely useless tests that may not have checked anything that mattered, but that did bump the coverage metric up.
Reams and reams of absolutely useless tests that may not have
checked anything that mattered, but that did bump the coverage
metric up.
I would go farther than that, and call those tests worse than useless. Tests that are written to appease some kind of automated coverage metric god tend to be tests that test internal implementations rather than API contracts. So you end up with a codebase where a single line change to production code requires something like 150 lines of test code changes, all because the internal representation of some data was changed in some kind of trivial way.
It's possible to avoid this failure mode, but the teams that are smart enough to avoid this failure mode are also generally smart enough to avoid having unreasonable coverage goals.
I have a knee-jerk reaction when most people say "refactoring". In many cases they're talking about a different, arbitrary way to deduplicate an implementation, usually in a mindless/mechanical way. If you can't name the 'factor' that you're separating and that factor doesn't have business or conceptual cohesiveness, then you're just pushing bits around for someone else to want to do it differently. So now when someone says it, I always ask what do you propose to factor and why?
Loaded language not withstanding one of the premier books is all about “mindless/mechanical” changes to structure without changing behavior. It is about step by step safe microchanges.
This might be the biggest difference between refactoring and rewriting.
But you are right it is pushing bits around with no behavioral change. That’s the point.
The open question is if this newly refactored software is more maintainable, more understandable, easier to change in the dimensions it will mutate in the future. Often the answer is “no” and you are right to be frustrated.
The execution (how) can be 'mechanical' but the goals and choices (why) should be thoughtful. If it's simply deduplication there is no thoughtful why necessary, although in the worst cases abstract machinery is created for the purpose.
You're right about my misconstruing 'no behaviorial change' as that is always a requirement. There should however be a change in where the seams of the implementation lie which better suits the direction of expected changes.
> I've also seen this sprint-task-fake-completed later cleanup misattributed to "refactoring", which I've started calling "fudge-factoring".
Having just spent a week clearing up and properly implementing my colleagues' 'fake completed' sprint tasks, I can resonate deeply. I'm making it clear to my boss that it wasn't 'refactoring', I was just implementing the actual requirements properly.
I'm going to go over their next PR with a fine-tooth comb, no matter how long it takes
When the team has gone that far off the edge, imo it’s good when the wheels start falling off, when tickets start piling up, when the dev team very visibly has egg on its face. Sometimes it takes until the team is embarrassed before management embraces the option to go nuclear on the dev methodology. But I don’t know how to ensure that it’s the poisonous methodology that’s targeted and not individual people/scapegoats.
Plumbers, electricians, roofers, painters and the like will all encourage you to handle your repairs timely and proactively do maintenance.
But in the end, it’s the owner (shareholders) of the home that is best qualified to make maintenance decisions based on their needs, wants and resources.
We as developers are service professionals (contractors) and it’s not unless you also have equity that you will have a better sense of how to handle tech debt appropriately.
The problem is a lack of trust, same as a contractor. If you go to a new mechanic and he says you need a $1,000 repair—do you? Or is he trying to sell you replacement blinker fluid?
A lot of managers feel the same way when engineers talk about tech debt, refactoring, etc. Maybe they've never read code, or aren't familiar with the specific codebase. Projects seem to be getting completed—what's the problem? Maybe velocity is going down, but are the engineers right about why? Or are they just being anal about something that's simply less than ideal?
A good contractor walks you through why the repair is needed, what your options are (cheap temporary fix, long-term repair, total replacement), the consequences of those choices, and how much each will cost.
Unfortunately in software we don't really know how to answer any of those questions. A lot of refactorings and rewrites just shuffle irreducible complexity around ("it makes so much more sense now!" says the developer who just spent a week studying the code and rearranging it to their personal preferences). Not to mention that we suck at giving estimates.
One reason why your CEO needs to be technical. I run my own small software shop, and I can sit with my developers and actually see that the code needs refactoring. Indeed, it is often me (the CEO) that actually forces a refactoring sprint on the team. They usually want to move on to the next beautiful feature.
But long term, our feature velocity and product quality suffers when the internals are bad.
> A good contractor walks you through why the repair is needed, what your options are (cheap temporary fix, long-term repair, total replacement), the consequences of those choices, and how much each will cost.
This 100%. Technical debt is placeholder term for engineers who want to do stuff but don't want to spend time explaining the people paying their salaries why they are doing it. Technical debt is meaningless. Be more specific. Present options. Be a professional?
> Unfortunately in software we don't really know how to answer any of those questions.
Good senior engineers can absolutely answer these questions. Is this not what you've seen in Enterprise software dev? I'd say if you're a senior engineer in this context it's more than half of your job.
I disagree with the analogy. Need for repairs arises because of the natural decay. Tech debt is a consequence of the decisions made at implementation or even planning stage. It's more like forgetting to order a plumbing connector and using duct tape instead. Except in software there often isn't any forgetting involved and the actual connector is never ordered.
A lot of tech debt is self-inflicted, but not all. Decay is induced by external forces and the same thing can happen in software, e.g. when underlying platforms change.
Mobile apps are pretty high maintenance because you often have to change something with new OS releases and want to roll out an update. You can abstract over the platform but that is in itself costly, so there is a tradeoff.
Bad analogy. Whether I 'own' the company (equity/etc), if I'm a full time employee, the maintenance/debt needs of the company still affect me. They possibly affect me more than someone who merely has 'equity' in the company (people may own shares in multiple companies). A 10% drop in the company's revenue might mean I lose my job.
If the painters/electricians also lived in the house, and it was their only/primary shelter, the 'owner' isn't the only one 'best qualified' to make maintenance decisions.
The other problem with “tech debt” time (and same with “devops”) is it becomes a bucket for all kinds of stuff that isn’t tech debt at all. For example upgrading framework versions, fixing bugs, and so on. Not necessarily wrong to do these instead but it is kind of the team fooling itself.
Instead I think you need the
unicorn of someone who understands product and engineering and can decide at any given time whether to pay down tech debt or add a new feature.
This is why I prefer a balance of "Product Initiatives" and "Engineering Initiatives" and that number has to be bigger than 10% because of all that it encompasses.
The hard part is explaining to the business the return on investment on engineering initiatives because often times, product thinks of this as a zero sum game. They really care about the sum output of features, but it gets conflated with the number of hours placed in product development.
It is engineering management's task to break that mental model.
I honestly think they're the same. If you think they're not then you simply haven't thought it through well enough, which means you need to spend more time on that.
There are some subtle differences, but framing is important.
Debt has connotations of digging yourself into a hole. If you are in a new project, you likely have all sorts of tools that you would like to build. You are not in debt to start - you have no code!
This gives latitude for thinking ahead and it also allows for including things like enablement and platform teams. Debt retirement, in this view, is a subset of technical initiatives. And some of that technical initiative budget should also go to product teams.
That is true but framing it as technical budget means there is no business benefit. This is false. Long term maintainability and increased innovation speed are clearly business benefits that you hide behind ambiguous terms if you use wording such as technical debt. Just be specific. Technical debt isn't specific.
I think an important aspect is separating "tech debt time" completely from your backlog or planning. Generally speaking, there's less stakeholders involved in reducing tech debt and less need for requirements gathering or prioritization - developers generally know where the worst problems are and can be relied upon to choose something to focus on next.
I'd avoid the temptation even to have "projects" during that time - make it purely about accomplishing refactors and tech debt removal that can be completed during that day, and specifically don't have plans or a backlog.
I've seen some success with targeted reductions in tech debt. Limiting the scope, limiting what can be changed, and focusing on a goal to be achieved by reducing the tech debt. It also makes it a much easier sell to the client.
I agree with this. I also find it useful for the team once they have identified an area or specific aspect of what they perceive as technical debt to tackle to take the various ideas they have for tackling it and do a very rough value Vs effort exercise together.
Just having that conversation can reduce both perceived technical debt (people explain to other people why it works in that way and that's the right solution) but also team angst about the issue as they get to tell other people about it.
Whatever is the easiest, highest value things normally pop out and you can slowly work through them with better understanding.
It also means you don't start work on the hardest to solve peeve of a single individual just because they are the loudest voice.
I think separating tasks for features and for quality is wrong. They are not two different things. There is only purpose, and that is to serve the user's needs. Therefore I also think technical debt is the wrong way to phrase it, because it takes away the need to express the task in terms of what it means to the user. If a tree falls in the forest and no one is around to hear it, does it make a sound? If the user won't notice technical debt, then who cares?
You can refine technical debt by simply asking the question: what will the user notice if technical debt is not repaid? Use the answer to that question to phrase the task and suddenly it's not technical debt anymore, but actual work that the business will prioritize (or not).
Thanks for pointing this out--it would clearly be the case doing the thought experiment and applying human behavior around shipping timelines. I may be in a position to dedicate some effort to reducing tech debt but I don't want it to move the goalposts so it's ok to ship more of it.
Instead I want to pick out the worst culprits that bother most devs and in particular onboarding new ones and deal with those.
Having been in this space for 25 years now, here's my take:
Tech Debt is almost always a few things in disguise:
1. Product Failure - The person with the final say on the product has poor taste. They poorly understand the tradeoffs their tech stack gives and how it influences their product. And most importantly they hate their customers and don't understand their needs.
2. Marketing and Sales dominance - The product organization might be competent but they have been driven out of the decision making venues in the company. So marketers and sales make the decisions and hand them down to product to deal with, resulting in tone deft decisions even as the company continues to make good money, the product itself erodes, until competitors arrive and instantly wipe out the business.
3. success and shift - The product, which was initially a success, has stagnated. The product itself was so successful that it created an entire new market with competitors that resulted in commodifying the product. Now the company is frantically looking for a pivot to keep growing the business, resulting in the original successful product drying up even faster.
4. Leadership void - The product was made by a strong, pioneering leader. They might not even consider themselves such a leader, but after they move on, the product fails without their support. The replacement leader might not even be bad at running the business for a time, but eventually focusing on EBITA alone won't inspire people, and the product will erode through churn.
5. Press Release driven development - The company operates by making a moat around its original core offering and they have a semi-monopoly in their space. So the only way to drive more revenue is to sell services that cost more to already existing customers. As a result, teams build products that are made to drive hype cycles and press releases -- once the product is shipped, the major players get promotions and move on to the next exciting product instead of supporting the now-shipped product.
Also 6: tech debt being another term for "code I personally do not understand," or "software written in a way different than I would write it." The article calls this out;
> Turns out some apparent tech debt was actually code that was better left untouched, had there been better documentation. We documented what we would not refactor or remove.
> Better clarity on the design and architecture of the code, enabled us to make better judgement calls when we had to cut corners due to the time constraints.
There is certainly real tech debt. But I have lost count of the number of times in my career I've heard a bright, but less experienced, developer claim that because a certain piece of code uses Formerly Popular Framework A from 7 years ago when it was written instead of Current Popular Framework B, it is unmaintainable tech debt, despite it having tests, a working CI pipeline, working monitoring, etc.
> It maybe tech debt depending on a lot of context, with old tech you get into a lot of situations like this:
Yeah absolutely that's a fair example of tech debt, where a tech stack is so old that it's difficult to impossible to support modern features. But "classic ASP doesn't support this modern feature requested by product" is a lot different than "I don't like working with classic ASP."
> Formerly popular is also a red flag, over time it will get harder to hire.
Really depends. Java is a great example. Very unsexy, not too many startups beginning life as a Java shop, most code is likely "legacy" by this point. But Java is so ubiquitous, if I were a Java shop I wouldn't be worried whatsoever about running out of Java expertise anytime soon. Maybe in 50 years.
On the other hand, if you were a legacy Java shop that jumped on the Scala hype train at its peak 8-10 years ago, or God forbid you built something in Haskell when it was getting touted as the next big thing, you're in a far more precarious situation. Ironically, many Java shops tried out Scala as a way to move their Java codebases forward, and now their now-legacy Scala 2.xx codebase is a way larger technical dumpster fire than their Java codebase.
> Also 6: tech debt being another term for "code I personally do not understand," or "software written in a way different than I would write it."
Sometimes it's quicker for me to implement a desired change on a code that was written long ago by someone who is no longer with us by re-writing that part of the code in the way I understand it, and then implementing the change, as opposed to trying to understand the code and implement the change in a way that aligns with it.
When Framework A is unmaintained and therefore has known, unpatched security defects then migrating away is critical tech debt that needs to be prioritized.
In my org frameworks are actually our largest source of tech debt issues.
Example: Somebody years ago bet on Angular 1.x, migration to 2+ was deemed “a near-total rewrite” and postponed, so now we have to rewrite in React or whatever flavor-of-the-month. The same happens with Java, .NET, and other backend frameworks.
This is a good one. I've seen (and probably made...) some refactor PRs where someone takes an old system and breaks it up into new conceptual chunks. Code reviewer comes in and they find the new code makes even less sense than the previous code. Of course it so happens that the reviewer already knew the old code, and so it's only natural that new code is less legible to them.
I'd argue code I don't understand is indeed tech debt. not sure what you were implying with your statement, but I sounded as if you disagree with the statement. undocumented code that no one knows is a serious tech debt issue. time should be taken to read it, understand it, and document it.
Agreed that code that nobody in the team understands is tech debt; code that only a few people in the team understand is a liability, but not directly debt. Whether you understand the code yourself shouldn't immediately cause it to be labeled as "debt", unless you're the I in team (which can easily happen for personal projects or badly-managed organizations).
In my view, the listed items are one step away from tech debt. They may be the pressures that lead to tech debt, but it's the devs that wrote and shipped the tech debt. I personally have maintained the quality bar of projects so that they ship later than originally scheduled. I may not be popular at the time, but eventually people start to notice that projects that I tech lead do complete (eventually) and run well after they go live with much less follow-on maintenance and support. Performance being built into v1 is one of those things. I'm not talking about an MVP for a startup, but extensions to an already high scale platform.
The ideal is when a team's experience actually matches the project they're assigned. They need to be up to the task.
This requires management to be at least more experienced than the teams they manage and to make good hiring and placement decisions. This is not just number of years in the industry, but the average number of years spent at any single company. They need to have seen the long tail of maintenance in the development lifecycle on projects they were wholly responsible for.
Agreed. I've seen an org that essentially let the devs run the show for picking timelines, technologies, features. It was a disaster. Endless design by committee, no progress, too many abstractions, and on and on it went with developer created problems after being given free reign with only general direction. Of course, this is a management issue at the root, all problems are, aren't they? :)
Creating functional software engineering orgs is hard. Between dysfunctional management, lack of support from other business departments, or problems originating from the devs themselves.
It's pretty easy to feel when it's not working properly and know roughly where the problem lies. But solving it properly is hard stuff.
> "You see, the leadership did not care about the code quality as long as the stories were delivered on time."
My problem with the concept of technical debt is that "code quality" is subjective, and more often than not it translates to "I don't like the structure of this code because I didn't write it". What matters to the business is that the code works. The owners of the business do not care if the developers enjoy working with the code or if they find it to be well written.
I hear all the time that bad code slows down development, increases bugs, etc. And that can be true in a number of cases. But most of the time it's just complaining. If a developer takes the time to re-factor the code base, it's very likely that whoever takes over from them in a few years will have the same complaints about technical debt and want to re-factor all over again.
The best way to ensure everyone's job and increase compensation is to deliver features that customers really want. Doesn't matter what the code looks like, as long as the customers are happy. Re-factoring is important to developers and to maintaining an organization. But it can wait until there aren't pressing feature demands.
Not really. When each PHP page was copy pasted and slightly modified, I had to manually search to fix each bug and fix it in EVERY instance.
Are you going to tell me that this kind of tech debt doesn't affect my time to fix the bug? I could have done it once, now I have to hunt for 5+ instances of it. Am I just complaining or did doing the original development in a lazy way directly impact my time to complete the task?
I could have recorded the exact amount between the original bug fix and finally fixing the last instance of the bug to quantify how much time lazy coding cost me. It may have taken the same amount of time to write the original pages with an extendable component that only has one copy and takes arguments to create all the variations we need. But in the end we'd have fewer lines of code to maintain in the future and that's also an objective metric.
I think it's pretty easy to come up with a counter argument like "hey what if I want to change the code for this one page and not the others? well, good thing each page has its own separate code!" If the code in question was shared, this person would have to spend time thinking about whether to add a boolean flag to the shared function, create a new function, or what-have-you. And I can see other comments here saying that boolean flags are also tech debt.
For your scenario, I completely agree that the decision to not share code lead to you spending more time. We can be objective about time spent for specific scenario A with codebase B, but we cannot enumerate all scenarios and all codebases and so it's quite hard to ask what's the expected value for "damage done" by a particular style of programming.
Boolean flag is also a design that doesn't scale to multiple decisions. You should get used to designing things in a way that separates those concerns. Why does page A need functionality in this manner and page B needs a slightly different one? It's because they do similar things, but there's a difference there somewhere.
Instead of copy pasting from page A and changing it, you should have a function that does the same things from page A and B, and then combining that partial result into the different pages in the way that you want. The function that does the same task for the pages can then be fixed for both pages if it has a bug in it.
If you copy/paste then inevitably the person working on it will forget to find and fix one of the versions.
But even if you end up spending a little more time to prevent code duplication than just copy/pasting, it STILL yields benefits in the lower maintenance cost since people will spend less time reading and understanding the code. Your codebase will have fewer lines of actual code, which is a win by itself, which you can quantify.
You won't be able to measure the time each person will waste re-reading identical code to make sure it's actually identical (or worse, trying to see WHY it's not identical), but it's always a cost if you're working with this code. The only reason not to touch it if it's already perfect and won't be changed. But usually I will start on this refactor when I do have a requirement (it already has a bug, or I need to change functionality)
On the flip side, change is far more risky when it affects many callers rather than once page. Whatever time you “wasted” fixing multiple pages might be instead spent validating that the single fix doesn’t produce any unintended consequences.
Simple duplication is probably the least bad form of tech debt I can imagine.
the real horror is they have slightly different version since they already had other changes done to them, and I have to understand whether I should merge the functionality, keep it different, etc.
when I have multiple callers it's going to be the same unless you pass in an argument, so that change is self-documenting since it signifies intent
Calling this tech debt will get you nowhere. Instead you could just explain the alternatives: what happens if we do this, what happens if we don't? If we do <X> productivity will increase by <Y> so that users will experience <Z>. Much clearer than simply saying "tech debt > refactor".
At the time of change, its obvious you're making the code shittier. That function? It doesn't do what it says anymore. That other function? Just a few more boolean variables. Hey, we don't need to refactor, we can just make negative numbers mean one thing, and positive ones mean another! Death by a thousand cuts.
This is also the best time to refactor. You skip the whole "I don't like the way the code looks" bullshit. The codebase doesn't support your feature? Make it support your feature, then code your feature.
And you know what else? You are only changing code you're touching. Refactoring code you don't need to otherwise touch is a waste of time.
But yeah, if you parachute in 6 months later to look for a refactor, its gonna be all gravy. How much shittier your code is just a distant memory. Get on that next feature - just another flag, just another lying variable - in hindsight, it's all subjective anyways.
Agreed until the very last sentence. Refactoring should never happen for its own sake. You should never make code beautiful because it's beautiful. It should be nice to the extent, as you correctly said, that it facilitates creating value, in form of a running program, not as source code. This means that refactoring is worth it exactly if the total time spent to refactor the code is less than the time you save implementing future features after the refactoring.
I'm at the start of my career as a software engineer but it seems to me a good rule for refactoring is that it should only ever be done if the advantage can be evaluated in the form of concrete new features being easier to implement. If you refactor the code so that existing features would've felt nicer to implement, or are nicer to read, you've already played yourself, because maybe you get a high of ecstasy because the code that was ugly before is now clean, but as soon as the next feature comes around that you didn't anticipate, it was all for naught, and you're writing ugly code again.
I’ve been programming for over 30 years and I don’t entirely agree with you. I agree that you should make choices in line with what will get the best long term value. Sometimes that long term value is captured by making dirty prototypes that you can learn from.
But sometimes you’re working in a codebase which will probably outlive you. And in that case it often makes sense to think of your code base as a garden that you tend. Spending effort making tending the garden delightful will almost always pay off given enough time. And here I’m thinking about things like improving the build process to speed up compilation, adding tracing to make debugging easier, refactoring, documenting the core design and philosophy, and so on.
You can’t make Google chrome, the Linux kernel, llvm, or other projects like that by taking dirty shortcuts every time you add a new feature. If you do, you’ll end up with a buggy, unmaintainable mess that is impossible to change without introducing new problems. This isn't just theoretical - look up the story of Toyota's brake problem a few years ago. I've also heard some grizzly stories about software practices at Boeing and Oracle.
Good cooks clean their kitchens. Soldiers maintain their equipment. And efficient programmers keep tidy codebases.
> But sometimes you’re working in a codebase which will probably outlive you.
This is so rare, it's not worth thinking about. Code lasts a decade or two before it's been completely rewritten or replaced. After 30 years, the oldest code that I ever touched exists in very few places (2 that I know of).
> Spending effort making tending the garden delightful will almost always pay off given enough time.
What constitutes "good practice" is ever-changing and it doesn't add business value. You work within the constraints of the time and move on. The people who do this find themselves in the best position, career-wise. Paying some small attention to maintenance helps you move up and on to bigger projects and responsibility. Like all things, pursue a middle ground of philosophy. ie You can fiddle with code forever trying to make it "better". More extensible, more maintainable, more clear, more more more.
Arguing over future cases (premature planning) is the most common sin. Planning for everything to be reused is not being efficient. Regardless, you have biases that inform what you think is "better" and how software will be used, which future you (and other developers) will note share.
> Arguing over future cases (premature planning) is the most common sin..
There's definitely a scale between "unmaintainable mess" and "architecture astronomy" where both extremes are pathological.
I've seen plenty of unmaintainable messes in my time - particularly from average teams who are lacking in good technical leadership. You see this sort of thing all the time in consulting, and in software teams at non-software companies. (Eg, in the airline industry). There's a reason why Martin Fowler talks a lot about software craftsmanship - because at places like Thoughtworks they could often use more of that sort of thinking.
I've also seen plenty of overengineered messes too. Most smart college students (myself included) seem to need to go through this at some early point in our careers, where we really spread our wings and discover first hand why writing thousands of java classes is a terrible idea. (Or whatever the poison of the day is). If all your abstraction does is move your food (business logic / algorithms) around the plate, or -worse- hide it, then your abstraction is making the code worse.
I have no idea which sin is more common. I think it really depends on what sort of engineers you spend time with. Google definitely suffers from the second kind of problem much more than the first - to the point where google (hilariously) often outsources making actual websites to external consulting companies, because their own software practices are too expensive to implement.
Does your team need more long term thinking or more short term thinking? Its impossible to answer without reference to your actual team, your existing practices, and what you're trying to deliver. A weekend game jam should be scrappy, and the code for the space shuttle should be written carefully. Neither approach is objectively good or bad; there's just different tradeoffs appropriate for different kinds of project. The best senior programmers can switch style & philosophy based on whats needed for the task sitting right in front of them.
It might be rare in the sense of code that will be around after you're dead. But it's probably not rare in the sense of code that will be around after you've left the company.
I tend to see little discussion about long term value. "Long term value" includes "making it less expensive to add features." At my current company, we would run in circles chasing the same bugs because of spaghetti code plus competing business requirements. The answer to "why do new features take so long?" was fairly straightforward.
We were able to argue for a rewrite partially on the grounds of the speed of future enhancements. During the rewrite, it has been amazing to several team members (juniors, management) at how easy it is to track changing requirements when there's good infrastructure and when solutions are kept simple.
Agree with most of this, except the last segment. I'll add that there are other aspects of "quality" in code. For sure, doing what the customer wants and is willing to pay for is paramount, but there are definitely other things that you want to deliver.
* how quickly can you answer questions about business metrics?
* how easy is it to identify that bugs are occurring?
* how long does it take to figure out what to fix when it's broken?
* how much change is involved in fixes (e.g. does a PR fixing an issue touch like 25 files or does it touch just a few files?
* how quickly can new employees come up to speed on the code and alter it confidently?
* how easy is it to add new features?
Now, these things are hard or take time to measure, but those are qualities I look for when determining if a codebase is good.
Reminds of this article from athenian.com which is a pretty good read on velocity and quality as a team / product org scales from 5 to 20 to 100 to 250 people: https://archive.is/FQKJH
I'm mostly with you, but "it can wait until there aren't pressing feature demands" actually means "it won't happen until something completely breaks and it can no longer wait." There are always pressing feature demands.
There are some things that are not just annoying to deal with, but actively impede development or encourage further hackery to make them manageable. Those things need to be fixed at some point, or there will be a real cost.
Strongly disagree. I write a lot of go code, and poor quality code introduces a lot of tech debt because it's un-testable. So, if you want to write basic unit tests, you can't do so easily without refactor large chunks of the code base.
So, if you've already shipped a feature and need to make some kind of change, you can't do so with any confidence. Implementing unit tests to validate you're not breaking existing functionality is impossible, so you need to refactor, which introduces extra risk.
I think the problem here is that there are very few places where there is any time between pressing feature demands. This always seemed to me what the agile guys were saying. The approach was all centered around the idea that things would always be changing and pressing. The only way to combat that is to put constant refactoring into the day to day so that your codebase can remain coherent and productive to work on.
The bonus of doing things like this is that if you are constantly changing your code then it becomes battle hardened to accept change. This gives you better ability to keep up with the changing demands of both customers and technology.
"Code quality" should be about readability. And by "readability" I mean "can the function/module/system be understood by the individuals on the team fairly quickly?" Just because someone doesn't like the indentation, or the looping style, or whatever opinions they have ... these things don't make code quality poor. If you can understand the system, make changes, and add features without bringing down large parts of the system, it's probably fine.
The problem as I see it: companies prefer the inexperienced devs because they're less expensive, and as a result, their inexperience leaves them frustrated with code that they "don't like." They then attempt to sell management on a rewrite on these grounds.
The team can only deliver features that customers want, on the schedule the business wants, if the underlying infrastructure of the code allows it. If the code infrastructure forces me to spend too much time implementing features, and feature requests continue pouring in, then there will never be time to refactor because there will always be "pressing feature demands."
Doesn't matter what the code looks like, as long as the customers
are happy.
Can you imagine if other industries had this same attitude?
"It doesn't matter what the warehouse looks like, as long as we ship on time, and the customers are happy."
At one level that's true, but at another level, it does matter what your warehouse looks like, because a neat and tidy warehouse is what allows you to ship on time. A warehouse that's filthy and disorganized is one which loses orders, ships the wrong products. Similarly a disorganized codebase with poor abstractions isn't bad because it's unpleasant to work in. It's bad because when a dev ships a feature, they have little or no confidence that they've done so without breaking two or three other features and causing customer dissatisfaction in the process.
Have you actually been inside many warehouses or factories? I don't think I've ever seen one that was as clean or visually attractive as the areas that customers were meant to see. You're right that they're (usually) not filthy to the point that it impedes operations, but it's quite correct that customers don't care about how the warehouse looks like as long as the goods arrive on time. Same for restaurant kitchens, bakeries, metal workshops and a host of other production facilities. The only production facility I've ever seen that was "clean" was the assembly area of a medical instruments producer (it was a literal clean room with suits and everything), but even their warehouse did not look attractive. Their customers were quite happy though, the company was growing fast.
Most programmers imagine other industries to be better because they only see the customer facing parts, but that is like judging the state of a codebase by how nice the landing page is styled. Penny-pinching managers exist in all industries.
Joel Spolsky has a good article about this [1], where he discusses apparent cleanliness versus actual cleanliness in the context of a bakery. I agree that, at least on the surface, a warehouse or commercial kitchen may be quite messy. But there has to be an underlying order, otherwise the system just doesn't function like it ought to.
> I hear all the time that bad code slows down development, increases bugs, etc. And that can be true in a number of cases. But most of the time it's just complaining. If a developer takes the time to re-factor the code base, it's very likely that whoever takes over from them in a few years will have the same complaints about technical debt and want to re-factor all over again.
You need to measure these things objectively. There are several ways to estimate technical debt, like asking system owners how long their teams would take to bring their code up to a list of standards. The time estimate alone is an excellent tool to measure the scope or remediation cost. But there are also several more aspects to technical debt: transmissibility (how likely it is to spread), remediation competencies (does the team have the skills to remediate it), maintainability (how much does it cost to maintain this case of tech debt over a month or a year), attributed bug count (how many bugs are linked to this instance of debt), and so on.
Refactoring code just because someone has complaints is an indolent and unsurprisingly ineffective way of maintaining it. As you say, opinions about how code should be written are a dime a dozen. The shortcomings of a particular system need to be meaningful and clear before they can be assertively fixed. They can be generalized, like "this code doesn't follow patterns that would optimize it" or "this code has had too many hotfixes and is now causing many bugs", or precise, like "cache misses here impact systems A, B, and C severely". But they still need to be clear and business-oriented. Then tech debt repayments can be very effective. I have seen bugs attributed to a system go down by 90%+ after a short refactor when it was carried out effectively.
Investing into systems that prevent tech debt in the first place can also be effective, but it likewise has to be done in a measured and targeted way. If teams tend to make mistake X, then you can often deploy code analysis or submission tools to identify that mistake and reject such code. Over a project's lifetime, you may write hundreds of such code validation tests, but in teams of 100+ people, they can prevent a tremendous amount of work resulting from tech debt.
In short, it is a mistake to dismiss the tech debt problem when one has not put in the effort to tame it or has gone about taming it in a lazy, wishy-washy way. If the team has particular code quality standards illuminating tech debt and allowing the coders to target it clearly, it can be remediated effectively.
Hit the nail on the head. I think it's actually a bit worse on the refactoring, yes the next snooty Sr engineer you hire will say you did it wrong. The bigger issue is that working on "technical debt" actually introduces bugs and decreases stability because you're wholesale changing out battle tested code for theoretically better code. Your customers will not thank you
What I take from your comment is that it doesn’t matter to business what the code looks like. But it does matter to engineers that they have buy in on the design choices. And that feeling like the code was designed in ugly / foreign ways decreases motivation and productivity.
I don’t think that the answer is for engineers to just “suck it up” and wait until there aren’t pressing feature demands (which usually never happens). But instead, we should spend time reading the code our coworkers and past employees have written to understand the design and philosophy behind it.
If the design of something you’re working on isn’t good by your standards, your productivity will still suffer if you leave it how it is. Especially if you have good taste. The feeling of deep authorship - like, “this is my garden and I’m proud of it” is something worth cultivating even if you only care about output.
I can close way more tickets in a day if I feel at home & in control with a codebase. And that’s something that matters to business as well as the engineers on the ground.
I consider “code quality” in our codebase to largely be defined by “does it follow our opinionated architecture?”
If the answer is “No”, it can be difficult for others to understand. Reworking that code to follow our architecture generally makes it easier for us all to follow.
One might need to not follow the architecture because it doesn’t handle something new to the app. When that happens, we try, first, to update the architecture to be able to handle the requirement and then implement the feature. But if that isn’t possible, we plan to clean up the debt as soon as we can after shipping.
Overall, it has worked remarkably well for our team and has led to what I consider the “cleanest” code base I have ever worked on. But it is dependent upon us having a shared understanding of how we implement features.
I’d think it’s the opposite: clean code indicates 1-2 opinionated individuals built it. Committee decisions are almost always overly complex compromises with too many special cases. See most RFCs, IEEE specs, congressional bills.
Code Quality is not subjective. If some improvement really is, then its probably not a code quality improvement.
The ability to reason about it or even measure it might be subjective, in the sense that most developers aren't really good at articulating quality. Also, developers often confuse familiarity and quality, or to be a bit more generous: are unable to accurately assess quality of unfamiliar technology.
Furthermore, there is also a widespread misunderstanding that if you can't 100% prove something beyond a shred of doubt, that any position is as arbitrary as a favorite color and any reasoning a waste of time.
> Doesn't matter what the code looks like, as long as the customers are happy.
Customers being happy or not is exactly the problem of Code Quality. When a system breaks down over time due to a pervasive neglect of quality, the customers won't be happy. In practice this is happening slowly over time. Your lead time increases, there will be more bugs and changes will cause more problems that are increasingly hard to overcome. It can also come as a collapse, for example a huge outage, data loss or security incident. This can kill your business.
Quality is not something done for developers sake, it is about the ability of making customer happy over time rather than just now. That is the ultimate justification, which isn't subjective, and the source of legitimacy of any time spent on refactoring at all.
If it is not ultimately rooted in solving a business problem, then it is not a quality improvement at all. And yes, sometimes developers try to masquerade personal taste as quality (refactor to something I know or like), and in these cases the outcome often isn't an improvement at all. In fact, it is often a regression. But that doesn't mean quality improvements aren't real or beneficial.
Right. I remember a very small non-profit that was so happy with my code the owner wouldn't let me fix a bug (lack of a fail-safe), having convinced himself it would never be a problem the way the software would be used. Despite the fact I was working for free. Two years later, someone else was assigned to use the software, and that "bug" incurred a phone bill that killed the non-profit outright, as I had predicted could happen.
To address the “I don’t like the structure” argument, I suggest quantifying code quality by how long it takes a new engineer (moving into the codebase or just hired) to ramp up on the system to fix a bug or add a feature.
There's always the risk that the people tasked with paying down the technical debt simply drive the code into a different corner. Excellent technical leadership is extremely rare and can make a huge difference.
> What matters to the business is that the code works.
Yes, but there is a time dimension to "code works". It should work tomorrow when a new OS comes out, and it should work when a library is updated because of a vulnerability, and it should still work when a new feature is added, and it should still work with a 100x more users.
The only way to make sure code still "works" in these situations is to continuously keep modifying it. And here the technical debt comes into play.
Sort of. But customers don’t want features. They want you to solve their problems. Most customers also value that being done reliably. So features are a means to an end as much as code quality is.
There are objective parameters like: tests coverage, how many bugs you have (or were fixed) on from the static analysis, presence of periodic or constant fuzzing, portability.
I really wish we would stop calling it technical debt. Every team/org I’ve worked in with “tech debt” issues has had very tactical problems that could have been communicated, invested in, and solved. But instead the org talked about “tech debt” - an immeasurable boogeyman that anyone outside of the engineering org has no grasp of and, most importantly, management up the chain to the C-Suite have decreasing mental models of investment/pay-off the further up you get.
Teams saying “tech debt” are perpetually under funded and under appreciated.
Instead, speak a language your management chain understands.
* These specific services have outgrown their architecture and back pressure keeps outgrowing their current scale, we need to invest in a more reactive architecture. It’s going to cost 3 teams 1Q and we will prevent N outages based on historical data.
* In 2023, engineers far fingered the deployment of these services N times causing various levels of service outages, one made the news, we need to invest in guardrails in our CI/CD to prevent that. It’ll cost one team 2Q and we will prevent N outages.
* We had 4 employees across our engineering org quit last quarter because holding the pager burned them out, we need to stand up a tiger team that can help kick our metrics into shape.
Speak a language your management understands. Speak in terms of delivering features (feature velocity), reliability (outages), employee retention, hiring through resume driven development, etc.
You’ll find you’re negotiating in a positive sum game if you do this. You give me 1 unit of investment for this problem this quarter and I’ll give you 1.3 units of return next quarter. And maybe there are greater returns elsewhere so you aren’t making a competitive bid and that is okay, or maybe your management will invest in you and you just signed yourself up to deliver 1.3 units. But don’t handwave and ask for budget.
But here's a crucial pattern in your proposed language (not sure if you noticed it): you have to let bad things happen first. You need N outages to happen. You need N people to quit.
I still believe it's the right thing to do. Humans suck at being objective. The moment we find a "flaw" in the architecture it becomes the most important thing in the world to fix it. Even if the "flaw" was there for 5 years and never caused an issue.
Sticking to objective signals (outages, quitting, bugs, etc) is the only way to stay grounded in reality. But you have to let those signals to happen first. More than that: they need to happen often enough to start forming a pattern. It's just the cost you have to accept, because the alternative is much worse.
E.g. it is impossible to invest in reliability, refactoring, bugfixing right on time. You can either be too late, or too early. Counterintuitively, being too late is almost always the best option. Reason being, there's virtually unlimited number of improvements you can do too early.
That said, none of the engineering teams I worked with could accept that. I know I couldn't.
Yes! And if they don't care about velocity and reliability, don't tell them that. If they've been going on about hiring and employee retention, tell them how this tech debt thing is going to have such a huge change that you can turn it into a kickass conference talk and hire more 10x programmers, which is a bigger value to them than "I made the app more maintainable". They don't care much what your team does; just give them something they want to buy.
I had to laugh. You're (correctly) saying: "Describe it all as a positive additional software feature you're about to introduce." Quite right. Sell the sizzle, don't use the word "vegetable" when telling children what's for dinner. "Internal features" not "tech debt."
A software company is a software company, not a technical company.
So, in terms of Risk and Responsibility, discussions of Technical Debt don't sufficiently examine the nature of Risk (across-org complexity sources, wetware, workflows, market feedback cycles etc.). The concept also skews and pigeonholes the Responsibility of dealing with it to a small subset of people.
I had to laugh. You're (correctly) saying: "Describe it all as a positive additional software feature you're about to introduce." Quite right. Sell the sizzle, and never use the word "vegetable" when telling children what's for dinner.
Not writing for your audience is a losing game. They may have been fantastic developers, but the siren call of upper management may be what they think and speak now.
I might be doing it wrong but I constantly pay down technical debt to almost zero. The reason I do this is because when I am working on an area of system it takes alot of thinking to understand it, thus I try to get as much done as possible in that context, because re-learning the context later is very expensive in terms of time.
I'm pretty certain that means new features are implemented alot more easily. Hard to quantify though.
I can say I'm always thanking "yesterday me" for finishing the job and cleaning up whatever "today me" is working on.
It's possible for me to do this because I do most of my programming on my own. I don't think this approach is suitable for most software engineering processes because they tend to deprioritise technical debt.
I call this “fighting the good fight”. Hardly anywhere will prioritise making your life better, I say don’t tell them and sneak in tidy ups and refactoring as part of every task. Essentially a codebase is either getting better or worse over time, I prefer to work towards improvement rather than ossification.
Having said that in some places and on some days (and working with some people) I don’t always have perfect energy for the fight.
I agree with you in principal. But it’s not “sneaking”. You are a professional, there’s no shame in doing the job right. Imagine if bridge-builders felt like they had to “sneak in” using the proper concrete setting schedule!
If I had to justify myself every time I took the basic initiative of fixing errant, problematic or poorly performing code just because it wasn’t ordained and blessed for me specifically I am out the door SO fucking fast…
Happy to have discussions and elucidate to others the change, and make whatever documentation necessary available, but if some leader somewhere things I’m a “problem” for merely having done the work…okay, let me solve that problem too: bye.
That’s an environment with serious trust issues manifesting as micro-management and I’ve got a severe case of post-micromanagement-stress-syndrome.
I think where the valid problem is in terms of priority.
Sometimes engineers “feel” the need to refactor something that’s not related to a high priority project and end up burning days as they weren’t experienced/competent enough to really know where that string they pulled lead and got stuck fixing newly failing tests etc. and in the examples I have in mind the context isn’t the usual “product feature must deliver pressure”, it’s actually pure engineering project, where we had to rearchitect a system before our scaling hit a brick wall, so interesting engineering work by engineers for engineers and still some folks just chase squirrels.
I totally love the freedom I’ve enjoyed to fix things at my discretion in my career, but not everyone is good balancing the time/place or “picking your battles” and sometimes you gotta reign it in in others or the stuff that needs to get done won’t be.
What is and isn’t related to the ticket is debatable but your assumption that refactoring always goes outside of the bounds of the ticket is extreme. Another one of my sayings is that software engineers (humans?) find it really difficult to differentiate their opinions on things from facts.
I've had one team member push back more than others.
One solution has been to do a 1:1 screen share with someone else who has more context or feels the impact of the changes more in their day to day, and work with them to get more constructive feedback and to approve the PR.
This is the approach I take too. I’ll refactor code / UI cleanup in the area I’m working in. Sometimes QA complains if they review commits, but I’ve never had to revert changes because of it.
Having run a couple of development teams on tight schedules, this is also what I arrived at, and I found it to work great for bigger teams in larger organisations too. With some caveats:
1) Be upfront about it, the PM or whoever is making the business decisions should at least be aware that the ticket is not being addressed in the quickest way possible. Sometimes that's a real problem and that conversation is worth having. Mostly it's not an issue, at least in my experience.
2) Pretty much never done while fixing a bug that occurred in production.
3) Done in separate commits / PRs. Preparatory work for the actual change, basically. This makes understanding the changes, as well as later regression testing _so_ much easier.
I may also be doing it wrong, but this is what seems to have worked best for me and my teams.
If you plan to work somewhere for more than 5 years, this can be done for entirely selfish reasons.
When a dev invests time improving the codebase, refactors mercilessly, and cleans up the code - they gain mastery over it, respect from their peers, and trust over the architecture. This is fantastic for your career long-term.
In reality, with developers hopping between jobs, it's a difficult case to make. Ironically, a hot job market for developers ends up hurting a key mechanism for technical mastery.
> I don't think this approach is suitable for most software engineering processes
I think that’s just badly managed technical teams.
What I started doing a few years ago was to group issues together by module/service. When I’d push work to the todo board, I’d include other, similar work. After all, it’s much easier to fix things when you’re already in context.
I tend to do this too but in a team environment, and I've gotten mixed feedback with some people pushing back on the number of changes or extra scope, and others being excited for tech debt being paid off and the value add.
If the volume of changes really gets out of hand I'll split it into two smaller PR's, then deploy one PR, merge main into to the other, and deploy the other. It's a balancing act sometimes to know where to stop and call it "good enough."
Sometimes the "interest rate" on the "debt" is low enough that your effort is better spent elsewhere. If your tolerance for tech debt is super low, then maybe it makes sense for you to constantly pay it down, but there are lots of times when it's a reasonable decision to just "pay interest" on it. It's worth keeping in mind the cost of the ongoing "interest rate" vs the cost to eliminate the debt.
I've seen this attitude play out. At some point the debt grows to a point where leadership plays hot potato and nobody wants to take responsibility for it. Rather than fighting it, they just leave, from the bottom level all the way to the VPs/CTO that made (and profited from) the wrong judgement call.
If you could measure and graph technical debt, I would find the idea of "planned debt" compelling, but the reality is is that people only understand they are in debt once they see the bill, and at that point it's too late. Rational people will see the bill and say "I don't want to pay that."
If the CEO or CTO do not treat debt seriously and like it is their personal problem, that attitude will quickly be organization wide.
Debt is a type of corruption. Corruption by its very nature grows greater than linearly. Just like with COVID, corruption and debt permissiveness have an R0.
When one person is corrupt, then another person is also incentivized to be corrupt. Every person who is ok heaping on debt/corruption is another person who won't resist it and instead breed self-interest/defensiveness, cynicism, and learned helplessness.
This all leads to a core understanding of corruption (and imho, technical debt), which is that you cannot plan for corruption and it will grow much faster than you expect. Once the culture is set, it is an incredibly hard thing to change. The people most able to make the change will have been the first ones upset by the culture in the first place, maybe enough to leave. The people who have thrived and seen organizational success and therefore gained organizational clout will be the least interested in a cultural change and the least interested in seeing the difficulty of their job go up.
I like to think I'm similar, much of that is also to working mostly by myself. The debt I do have is related to inheriting legacy code.
That said, as much as I try to maintain a certain level of quality, eventually my designs break and I need to refactor things. In times like that I'm glad I have a good level of automated testing to catch regressions.
I agree with this, but it also requires some skilled automation or very well-defined boundaries; otherwise you spend all of your time maintaining and not enough time implementing features.
While there is no standard way to measure, I think almost any developer can look at code and either be impressed by its elegance or disgusted by its depravity.
As a PM, can I just say that your PM is fucking useless!
Your PM is meant to be commercially minded, meaning they should understand the concept of compound interest.
Every new feature you build, on average, increases the value of your product linearly.
Every major piece of tech debt incurs reduced efficiency that compounds. It is extremely common in older systems for the incremental value of resolving tech-debt to be HIGHER than the incremental value of shipping "another feature".
IMO if an existing product is swinging between failed features and wild successes, chances are that it's being done based on luck (either good or bad), not any particularly great PM practise.
Sure, startups and new products will either go gang busters or broke. But any decently mature product is almost always going for steady incremental improvement based on feedback and market shift. Now, I'm not saying they're achieving that... but that's what a decent PM should be aiming for.
You're right, value tends to multiply product value by a percentage, at a minimum. And can lead to a breakthrough in applications and sales. Which shouldn't be underestimated. Worth pointing out.
But complexity (particularly if you can't squash dependencies) tends to go up by orders of power as sloppy code grows. I think that's their point.
Can you provide an explanation as to why it doesn't match your experience? There might very occasionally be a "killer feature", but that seems far more uncommon than features incrementally increasing the value of the product.
For starters, a lot of features add negative value. For most things, people don’t want everything and the kitchen sync thrown it. It confuses them and makes the product hard to use. If your model were true, any dummy off the street could achieve linear growth in a product by just continually adding more features. That’s not how it is.
Second is what you hit on. Some features add exponential or greater value.
This means that the value of new features does not fit a bell curve and reasoning about it with averages is a mistake.
Where there are dependencies between features, effort increases more than linearly (even more than polynomial). If usefulness increases only linearly, that puts an upper bound on the meaningful size of an implementation.
My thoughts exactly. Extremely irresponsible leadership. I'd guess that they probably are just looking to use some year over year growth metric for the year or two they are at this job and then go hop to the next job with that on their CV.
No offense intended, but most PMs I've met are fucking useless. Mybe they provide some value to the business that I'm not aware of? But they seem clueless about how to interact with an engineering team, and things like the product being a complete swiss-cheese shit-show that crashes when you blow on it wrong come as a complete shock to them. Every other kind of product has some basic metrics to determine its quality, yet the PMs seem clueless of anything that isn't in their Jira roadmap. But maybe I've just had bad luck...
To be fair... most PM's are fucking useless.
There's fundamentally two types of PM's:
A) People that were doing some other job (Engineer, BA, Project Manager) and over the course of a year or two, just naturally gravitated towards taking more and more of an ownership role over the actual business outcome that the team they were working on was trying to achieve.
The job they do looks nothing at all like the one their job description says they do... but it seems to add a lot of value and a lot of people seem to want them to keep doing it.
One day, they meet someone called a "Product Manager" and they go "holy shit... the stuff you do... that's what I do... am I a Product Manager?".
-- These people make excellent PM's.
B) People that were doing some other job (Engineer, BA, Project Manager) and over the course of three to four years, saw a small sub-set of their colleagues lead really meaningful projects, garner a fair bit of respect amongst their co-workers, probably end up getting paid more than they did, and eventually go on to working in roles called "Product Manager".
These people decide that they too want to be respected, paid more, and lead meaningful projects... so they add the word "Product" to all their previous job titles on their CV, go do a 3 day Product Owner certification, and then start applying for PM roles.
-- These people make fucking terrible PM's. They also tend to make up ~75% of PM's.
>No offense intended, but most PMs I've met are fucking useless
Beyond basic horse sense, technical credibility, customer outreach, market sense, & roadmap development by nurturing sheafs of competing priorities while constantly mindful of critical constraints...
...A PM has to make their management happy. Who increasingly isn't technical at all.
Show me a 'pointless, clueless, idiotic' user story or epic, & I'll show you the result of PM negotiations with stakeholder(s) who asked (demanded) something far worse.
EVERY non-technical %VP, C%% and VC I've met just Loves to say "it's just software; it can do anything"... & completely miss the bitter irony that statement should embody.
The only solution I have seen work: Paying back text debt should be an integrated part of regular development rather than a separate task. When you change or add a new feature in an area of the code, you clean up tech debt in the area, at least to the level where it is better than it was before.
This also avoid "moving around deck-chairs"-refactoring since the refactoring are coupled to specific development tasks. You refactor to make implementing the task easier and cleaner - no more, no less.
If some area of code is "ugly" but works and don't need any functional changes, leave it be.
Scheduling tech-dept-payment separately runs the risk of getting de-prioritized. If a deadline is approaching or the company need to cut expenses, I'm sure the dedicated "tech-debt-payback" time is the first to get cut, "temporarily".
The "X% for work engineers want to do" approach always strikes me as a poor substitute for communication and sober evaluation of the value of work.
If I am a PM, and my team's velocity is badly bogged down by some tech debt, then the right allocation to fixing it is 100% (fix the shit so we can go fast forever.)
On the other hand, if some "tech debt" doesn't actually impact team velocity/clients (eg, some code is "bad" but it's in a part of the system that's never touched) the right allocation is 0%.
There's ultimately the only thing that matters - getting value out to the customers. Tech debt only matters to the extent that it gets in the way of that, so prioritizing it vs features is easy because at the end it's still about "what do the clients get, when"
The X% approach seems to happen when engineering and product fail to have that conversation, fail to understand each other, so they have to just get a flat allocation each.
Thanks for sharing your perspective. Some fun sprinkles of complications
> and my team's velocity is badly bogged down by some tech debt, then the right allocation to fixing it is 100% (fix the shit so we can go fast forever.)
That's impossible. Software, like all things in nature, evolves and degrades over time and there is unlikely a magic 100% fix. Something can always be improved.
> On the other hand, if some "tech debt" doesn't actually impact team velocity/clients (eg, some code is "bad" but it's in a part of the system that's never touched) the right allocation is 0%.
Tech debt is also a great way to get your executing development team down to 0%. Yes, some people leave organizations because the code quality is bad and no one cares to improve it and it makes everything just a tad worse. This can compound over time. One day, the thing that is never touched will need to be touched and no one will be able to understand it in a timely manner, leading to other failures and re-prioritization.
> The X% approach seems to happen when engineering and product fail to have that conversation, fail to understand each other, so they have to just get a flat allocation each.
Agreed, a constant conversation is a good starting point but eventually, there will always be a bias in one direction. As you demonstrated, your bias is "There's ultimately the only thing that matters - getting value out to the customers". This makes you more likely over time to prioritize shipping to the customer over invisible improvements. The feedback loop continues until things breakdown. If you're in doubt, go ask your EM/TL partners if there isn't some technical debt they think is more important to address in the next 3 months than whatever you're currently trying to ship.
Technology organizations are quite complex systems but i appreciate your attempt at simplifying.
I agree with your worries and that feeds into my system of thinking. For example:
// One day, the thing that is never touched will need to be touched
I would categorize that as a potential risk to the thing that matters (customer delivery over time.) So the conversation that should be had then isn't "module X is written in a bad way" but "we're down to the last person who knows module X, if he leaves and we need to change it, we're screwed. Are we willing to tolerate this risk?" (the answer may be yes or no, but expressing it in terms of impact on client delivery is more accurate than just tech debt)
// Something can always be improved.
Yup, similarly to how you can always decorate your house better, buy a more fully-featured car, etc. The question is should it be improved, at the cost of whatever else you can be doing?
// If you're in doubt, go ask your EM/TL partners if there isn't some technical debt they think is more important to address in the next 3 months than whatever you're currently trying to ship.
That's exactly the conversation I want to have. Because if they are right, I want to be persuaded so I can advocate for that investment. But if they are wrong, I want to persuade them that whoever we're doing IS more important, so we can focus on that.
// As you demonstrated, your bias is "There's ultimately the only thing that matters - getting value out to the customers".
That is my bias indeed. Sometimes this bias causes me to ask my engineers to fix non-functional things when they are charging ahead to do features - it cuts both ways.
I've seen it said that in any ongoing work on a large and complex software product, about 20% of the time is spent on upkeep just to keep the system from degrading further. Probably your team was already spending a portion of their time shoring up the worst parts. That's typical, and why the build up of problems slows down feature work.
I always plan on spending some percent of my time on work I know I'll have to do before I can start adding the new features. Regardless of how I'm asked to estimate, I keep in mind the cleanup work in accounting for the total effort.
Unfortunately there are still a lot of PMs and folks, including some programmers, who aren't aware of or don't understand the need for structural maintenance. Those are the teams I regularly see start out going very fast but within a few months get bogged down in having to work in the awful system they built.
Story old as time. Management typically prioritizes features over other types of work. If your ENG dept has weak leadership it becomes daunting to fight for the 20% required for maintenance of the system.
Then at some unknown point things slow down in a noticeable way and everyone scrambles to fix the slow down without fixing the culture that lead to it, and that’s if you’re lucky.
Somehow in management eyes rewrites are more palatable than refactors despite costing 10x more
There's also a psychological component for the devs. Asking me to fix your trash heap? Bye! Asking me to rewrite your trash heap? Sounds fun.
In all of these discussions about reinventing/rewriting/reimplementing already solved x, people forget that other people aren't interested in maintaining something where the fun, education, and impact was already had by the people who came first! They want to have their own fun, education, or impact.
Pretty much every conversation that starts with an angry "this has been a solved problem since such and such genius from the 70's did y" is born from misunderstanding what drives people to do anything.
I usually tend to do the refactoring without asking permission. Just spend 20% of my time on it, get it done, definitely pays off. I tend to be pretty productive generally and with this refactoring I'm not doing something like saying "give me a week to refactor from X to Y", instead I just do a little bit every day or two, so I've never had much problem with this from management.
If you're delivering features quickly it shouldn't matter if you spend some of your time making sure you'll be able to continue doing so in the future.
> Somehow in management eyes rewrites are more palatable than refactors despite costing 10x more
In part, it is resume-driven development. Rewrites are major projects, and running them (and the associated scale and budget) looks good on a management resume, and provides a nice accomplishment item.
In part, its the short-term incentive to be seen to do more with less, and move on before the consequences bite (this is also a kind of resume-driven motivation, though its more resume-driven development avoidance.) You defer work on anything but visible features, and do those in the quickest/cheapest possible way that will work in the short term, and use the credit for efficiency to move up and out before the deferred maintenance catches up. When it does, someone else gets to do a resume-enhancing big replacement project, since the state of the app makes fixing in place seem impractical. (Ironically, even with deferred maintenance, incremental remediation would probably be quicker, more efficient, and less prone to major timeline and budget-busting surprises, but when it has gotten bad enough, even the technical people who might recognize that will back management’s desire for a complete replacement because they don’t want to have to deal with the legacy mess.)
I always tell the young devs that they shouldn't ask for dedicated refactoring time but it's just part of professional development. You have to trust that using good practices consistently will help you in the long term and actually speed up development.
If they are having to touch that old code to add new features, fix bugs, migrate to a new context (say, OS version), or improve the performance/memory usage/responsiveness, then they should be improving it. Whether or not the developers are "given" time, making the change will take the time. That time can either be because the system is bad and making changes is hard, or because the system gets improved to make the changes possible.
Good article and a good inspiration to debate this topic. Honestly I already get in a bad mood when I read this conversation with the PM - this mentality "We need to timebox that activity so it does not swallow the time that should go to features and bug fixes." - depending on the tech debt and state of the codebase devs are wasting enormous amounts of time working around stupid issues (i.e. tech debt) and even a little bit of team effort might help - there is maybe some company out there where someone argues for rewriting the almost bug free haskell code to use higher kinded types and lenses because at the moment it's not so looking well but I guess in reality often there is just a lot of shitty cruft in java/javascript land that a few weeks of thinking and fixing (in the team, with lot's of discussions) improves much more and gives more time for features than avoiding that.
This is one area where velocity tracking actually helps PMs understand what you're talking about.
If you follow the "boy scout" rule of leaving every file a little better every time you touch it, you will be slowly cleaning up debt while also slowing down velocity. If that is not an acceptable trade-off to the PM, then they need a proper estimate for what the cleanup will take- that's the armor they have to shield you from shit raining down from higher levels when someone wants to know why the new feature timeline is being delayed.
If your tech debt is so bad it can't be incrementally improved, then it can usually be reframed as building a new version of feature X when really you're just rewriting it. Sell it as a performance / stability improvement and voila, it can be prioritized appropriately.
If you have tried all of the above and your work priorities are still being dictated, either the company is in a tight spot (contracts require features on a tight timeline) and you need to muddle through until the next phase, or the company isn't regarding you as a professional and it is time to move on.
No skilled tradesman worth their salt would let management tell them not to clean the shop because that takes time away from building things. The professional dictates the work practices.
> We need to timebox that activity so it does not swallow the time that should go to features and bug fixes
it is probably more so that they can still have good estimates. If you start introducing new work that has no time limit and no timebox, it can push out your estimates.
Maybe I'm an un-organized mess. But there's no such thing as "x%" of my time. Theres's just time. And I'm either focused on doing something or I'm not. Management loves to think that they can parcel out time in that way. But it's a complete fantasy.
I used to feel that way but now I appreciate the usefulness of "% of time" concept for aligning individual or team effort.
For example, if my boss and I agree that I should be focused 80% on new business development and 20% on keeping existing customers happy, it doesn't mean I allocate every hour or every day this way, but it does mean that I pay a lot more attention to A vs B, but don't neglect B altogether.
That's a useful way to agree on what is important in your context so your work can overall reflect that.
Tech Debt Friday worked for this team, good for them. It will probably not work for another team.
Tech debt is never ending discussion. But ask yourself: why is almost every team struggles figuring out how much time to dedicate to tech debt? Why is there a constant tension between engineering and management?
The answer is simple: engineers and business people don't understand each other. We live in different worlds and speak different languages. You can't solve this problem with a methodology. If Tech Debt Friday or Google's 20% works for you, that's just luck (or wishful thinking).
Once you understand where the problem is coming from, the solution is also simple: find someone who speaks both languages and trust them to decide. Typically that person is a product engineer: it's easier to explain business to an engineer than engineering to a business person.
And when I say trust, I mean both engineering and management should do it. I.e. if that person says a couple of outages is not a big deal, it's not a big deal. If that person says we need to spend next week refactoring, then that's what you spend your next week on. Obviously you can still challenge those decisions, but you have to accept that this person is an expert in their domain. They don't know everything, but they know more than you. Even if you're the CEO. Even if you're coming from FAANG with 20 years of experience.
That's why companies where founder is a product engineer typically don't have a tech debt problem (they still have tech debt, they just don't have a problem with it).
I know it sucks to not have a mathematical or managerial solution to tech debt, but the tech debt is inherently complex and humane. Tech debt happens when humans are solving problems for other humans. The only solution to it is to have another human in between. The quality of that solution will depend on the quality of that human. It's not going to be perfect. But it's the best you can do.
"find someone who speaks both languages and trust them to decide"
This is ideal.
But finding code monkeys that can also tapdance isn't trivial. For some companies, it isn't possible; or HR won't pay the price to hire them. Sometimes management wildly overestimates their technical chops and won't tolerate "bilingual" middle managers because such people are constantly advertently or inadvertently reminding top management of their actual lack of technical knowledge.
So this is where the Fridays scheme comes in; it requires you to gain that trust once, for the one day a week of self-directed work. You might still be able to do that. It also helps prevent dev team "tunnelling."
The author makes clear at the very end of the article that in some companies, even the amount of trust nec to pry loose one day a week can't be obtained, and the Friday scheme won't work.
> And when I say trust, I mean both engineering and management should do it. I.e. if that person says a couple of outages is not a big deal, it's not a big deal. If that person says we need to spend next week refactoring, then that's what you spend your next week on. Obviously you can still challenge those decisions, but you have to accept that this person is an expert in their domain.
How often do products go under because of unresolved tech debt? Is it many? When I first got into software I spent a lot of time on refactoring and cleaning up code. As time goes on I find myself doing that less and less, and just focusing on work that produces meaningful results. Fixing tech debt, from what I can tell, is less meaningful than conventional wisdom says it is.
No software will be perfect. Eventually it will die and something will replace it. I think knowing what "good enough" is, is perhaps the more important capability. Larger fruit hangs lower than tech debt the vast majority of the time.
When a software startup fails it’s rare that we ever get to find out the real reason for it. The straw that broke the camels back gets the blame - the recession, some hackers, Google, whatever. Nobody goes to the newspaper saying hey, we neglected every possible aspect of care and maintenance over the years, put a bunch of interns on fixing issues using spreadsheets and raw SQL and bled our customers dry over a decade until they all left. Usually the bigco that got suckered into acquiring them gets the blame and the founders are on a yacht in the Caribbean.
Tech debt won’t kill a company but it will strip away its immune system leaving it vulnerable to the slightest hiccup. Then the deadly hiccup gets the blame.
I've known a case way back where deliberate tech debt of a sort was built up by not switching to a more modern compiler for a prototype. The idea was to then do a hopefully-quick rewrite on a more modern setup to obtain the minimal viable product (not a term at that time.)
But Microsoft (who I think probably got wind of this product) announced a very similar product (that never appeared as such, and did not appear at all for at least a decade. The problem was harder than it looked.) Our financial backers immediately and fully pulled out, verbal contracts notwithstanding. With no more cash, the company was dead. A few years later others created just part of this product, and raked in very nice profits for a long time.
If the same prototype had been built on the more modern compiler, that would have been the minimum viable product.
The more modern compiler was one able to use more memory as the 286 cpu allowed.
This was a failure of product management and, doubly so, engineering management. A PM working in software absolutely needs to understand the concept of technical debt. Not necessarily to know where the debt is on their own, but to work closely with engineers on understanding and managing it along with product functionality. That said, I reserve most of my scorn for the eng manager. This is the person whose duty it is to say that yes, in fact 20% is the right proportion for tech debt work after it's been neglected for so long. This is the person to stand up to leadership and explain why this is necessary.
Autonomous teams with bottom-up decision making are much more likely to push towards the right thing even with clueless management, whereas there is little hope when incompetent decisions are fed to the team in a top-down manner (yes, autonomous teams indeed exist - though they are typically a sign of competent management).
One other trick I used, is to have a “tech debt week” at the end of every quarter. 12 weeks of coding per month, then a week for the managers/PMs to evaluate the last Q, and plan the next one. At the same time, while there is this awkward 1wk window where the plan is in flux, the engineering teams can focus on polishing their tools and attacking tech debt that might take more than a day every sprint to make progress on.
Of course, this planning window is probably too short a window for big companies, and too much planning overhead for one-team startups. But for 20-30 person engineering org this cadence worked well.
This is somewhat similar to the Shape Up methodology from Basecamp [0] where there's a 2 week "cool down" at the end of every development cycle. So 6 weeks of feature development, followed by 2 weeks of bug fixing, refactoring, etc.
I've never done it in practice so I don't know how well it really works and there are some other parts of Shape Up that I disagree with strongly.
If you can afford that level of slack, I think you will build a very happy, long-term-focused org.
Worth noting that they build and open-source frameworks like Rails, Hotwire; if you want to polish and share, you need extra bandwidth vs. just building internal-facing products.
Sometimes I feel I'm the only person who thinks tech debt is not that difficult to solve.
Everything needed to be solved is under your control. It may take time and be boring, but you have everything you need to solve it.
Product-market-fit, customer acquisition, and etc. are often much more urgent/difficult to solve, and we should focus on those first.
What I've seen in a lot of teams is tech debt (or speculative tech debt if we move quickly) is exaggerated to the point that they cannot launch quickly to acquire and iterate with customers, which is a huge mistake in product development.
v1 - shitshow for product market fit, deliberately debt heavy for speed/iteration
v2 - from the ashes, this is what v1 should've been, emphasis on the long-term
You've already done the hardest bit (imo) by figuring out the kludge blueprint that is v1 it's reasonably easy to build v2 with the lessons learned from v1 fresh in mind.
Of course, v1 is the default model in most shops and there is no v2.
In a lot of the times, there is no V2 due to 1 of the 2 states:
1. V1 is too successful. You cannot slow down the growth. It is bad, but not as bad as people make it out to be. Many startups are trying very hard to be in this state.
2. V1 fails, so there is no point for V2.
In a large company, there is an additional state: long term ownership is hard. By the time, we should implement V2. The original team is already promoted for PMF and move on to a new shiny project. The new team would just complain tirelessly about the tech debt because obviously they aren't getting the reward and are stuck with a shit job.
Now nobody would advocate for this way again.
Not sure why my argument now turns the opposite direction.
There wouldn't be tech debt for v2 is the point: it'd be from scratch code-wise whilst applying the design/architecture lessons learned.
A lot of tech decisions are only apparent after the project is built - revealing them with a rapid protoype and then scrapping it to build "what should have been".
Ultimately, as you say, it is pointless as there is little business incentive to work like this as most customers would gladly accept the crappy prototype as final.
Before clicking on the link, I expected failure. Turned out it is a success story, and I am a bit disappointed.
I have seen "code cleanup" that made things worse countless times. Especially if the idea is to dedicate time just for that, instead of doing it as you go. I expected the article to be about one of these failures.
The article describes 7 points, and suspiciously nothing goes wrong. It is rarely how it happens, there is typically some trial and error involved, and knowing what failed is as important if not more so than the final solution that works for you.
Tech debt is simply another debt that a company or org needs to deal with. It's far cooler for a tech company to worry about their funky no vowels thing than say the rent.
My boring little IT company in the UK owns its premises - roughly an acre in a town in Somerset with a 19m long edge red brick two floor building. The property costed us about £240,000 - the mortgage is cheaper than the rent we paid on part of a converted stables really out in the Styx.
Now with a mortgage there is the fine print. In the UK it is normal for a bank to require a "debenture". Your Loan to Value is below a percentage then the bank can intervene at any point and take over, which is what at least one bank in the UK did to try and shore up their finances when it all went south in 2007ish.
There is another thing called an "Overage Clause". That's where the vendor wants to make a hold over future profits. So we bought our place from the NHS and they sought an overage in perpetuity (for ever) on any profits we might make on selling the place. We negotiated on 10 years. That expires this year. With it, we would have to hand over 50% of any profits from sale.
Running a company is not rocket science but it can be tricky. My little firm will never be a unicorn or even a mare with a cornet on its nose. I don't care.
10% is under investing and this is a bad methodology. How do you know what to work on? Is tech debt just the stuff that annoys you or is it actually bad?
What you need to tackle tech debt is a quality program with metrics. I’m talking code coverage, cyclomatic complexity checking, linters and scanners, DORA, SAST, DAST, etc. Quantify your quality. Then quantify the risks and costs of not improving it.
Then you need to target the areas of code your tools tell you to and you need to make a conscious effort to solve those very specific things. “Module A’s complexity score is 26. Our standards say this needs to be 10. Therefore this is considered a quality item. Therefore it goes into the sprint as a strategic investment.”
Software and business leaders, when developers talk about tech debt… they are talking about managing complexity. (Shit breaking all the time because you don’t have tests in a complex system. It’s failing because of the complexity.) High complexity is expensive. If you do not balance the need to manage complexity against features and you do not act intentionally about your quality your software will eventually fail or if you’re lucky it will just reach a stage where you can’t maintain it anymore and you’ll scrap and rebuild.
IMHO this was the best part of the (interesting) post:
> Having dealt with tech debt in a collaborative manner, enabled us to do the “regular work” faster because we had a better collective understanding of the code, and the code was cleaner to work with. One could argue this is just a positive effect of mob programming, but the lack of a concrete agenda also helped the autonomy that unlocked creativity.
The types of tech debt described don't really match well to the real problem ones I see. These are major decisions deferred / delayed that never got rectified and the cost has risen exponentially to address them. To get them done requires a total stop-the-world type effort with simultaneous migrations across multiple teams / services / products.
There's no such thing as 'technical debt', just call it crap code (or a 'hack') and face the impacts of having to live with it. Its been my experience that these things get logged to Jira as 'medium' or 'low' and then of course never get looked at again, and if they ever get opened up its likely the code has moved on. This leads to the following idea;
1. code the MVP the customer accepts (customer is happy)
2. go ahead and create all the debt tickets (makes you feel professional)
3. every xmas just delete all non-high tickets older than 12 months.
If the code is shit enough, it will die a natural death (most of today's wiring type code has very short TTL anyways) Similar to "if a tree falls in the forest...", "If a customer doesn't notice...". We do so many 'invisible', 'hard' things in our line of work, it's almost an impossibly thankless road to build quality into our systems.
Why not 15%? Or may be 5% will be enough. Actually, it’s a bad idea to plan time in such manner. If devs need, they should be able to refactor/fix things during normal development process without any “just 10%”.
I had started the "Documentation afternoon" on Fridays in my team. And everyone seemed to like it, to wind down the week with uninterrupted time for documentation (email catch up, expenses, etc.)
Dedicated budgets for tech debt only serve to justify decisions to cut corners in the first place. It’s not a prioritization issue or a matter of smarter sequencing. It’s a cultural problem.
Shitty developers are going to make shitty software.
Every City Mayor wants to build their bridge in 1 week. However, you don't see Civil Engineers "cutting corners" and leaving "debt" when designing and building those bridges. There's a minimum quality that they will sign off. No matter how loud their bosses shout.
When I lead Software Dev teams, I always told developers not to compromise certain quality factors. If someone asked them "can it be done faster" and they answered "yes, but X thing would suffer" they are giving a door to for sloppiness, and the Business side doesn't have a way to understand the implications of X. They only hear "yes". On the flip side, if an developer says "it will take 2 weeks", there's nothing else non-technical people can do but sit down and wait.
The only ones that can really challenge that are other technical people, and hopefully if Business "escalates" the developer to his manager's manager (up to the CTO), she will be backed up by the technical higher ups, if she makes sense. I would sit down with people trying to shorten development time of some things. There are times where it is sensible to get some debt, but it is a decision that shouldn't be done by Jr or even Mid level engineers.
Agree with everything else you said but take some issue with this:
> Shitty developers are going to make shitty software.
That’s really not the point I was trying to make. It’s more of a trap to blame ICs on the ground in most situations when they’re usually only following existing practices. All this will do is demoralize your engineers.
Mismanagement in the form of poor staffing choices (especially moving tenured and experienced engineers to the next shiny thing) is often the culprit. This usually stemming from the pressure from leadership to ship stuff fast, as usual.
Once you sacrifice quality of any kind, that bar gets lowered for good, and before you know it there are so many examples of poor quality that any issues get lost in the noise.
I would argue that if they were able to pay down the tech debt that quickly and easily, then most likely the debt wasn’t as bad as described to begin with.
In places where there is not much time for code refactoring, the following is helpful:
Imagine an idealised future state of the codebase, which everyone buys into, and make sure any new feature is going in that direction.
Refactoring existing code can be death by a thousand cuts- having a parallel new codebase which is incrementally adopted can be more efficient and quicker to market.
I think the concept of technical debt is a kind of corporate culture. it can be repaid over time, but it's likely to reappear unless you reduce the things that caused it. that is, of course, it cannot be completely eliminated. but for example, things like serially redundant feature development can be reduced.
I think using financial terminology to describe technology problems is a mistake.
There is no such thing as tech "debt". Or, if there is, it's entirely subjective.
I've seen things described as a "6+ month major rewrite" that another developer could address with careful, incremental enhancements over a matter of weeks.
This was a failure of product management and, doubly so, engineering management. A PM working in software absolutely needs to understand the concept of technical debt. Not necessarily to know where the debt is on their own, butand to work closely with engineer
The only reason i raised issues about tech debt in previous companies is because i don't want to inherit from a messy codebase created from the formers.
Even with a good test suite, you can go very far. Clean test (interface) is more important than clean code (implementation)
The interesting aspect IMHO is not that they set aside 10% of their time to fix tech debt per se, but that they set aside 10% of their time to collaborate on it / engage in mob programming.
The first rule is not to create tech debt in the first place.
The PR (Pull Request) that creates tech debt should come paired with the issue to deal with it.
Got it: when you accept money for letting someone sleep in your tent you are $1 million in debt since you are now obligated to purchase that plot of land (which may not even be for sale at any price), find a licensed architect, design a house, get building permits, find contractors and build it. Plus you are infinite amount in debt since it takes infinite money to build a time machine so that your initial customer can sleep in appropriate lodgings retroactively. Often code with technical debt has paying customers, and going back in time to get them what you imagine they should eventually get is part of the financial equation, right?
Thus technical debt always includes building a time machine so that your initial customers can use the product they should have gotten from the backend they should have gotten it from, back when they should have gotten it.
In other news, the world's richest person is actually the world's poorest: Elon Musk built a tent in people's minds and now he has to deliver on it. His technical debt is a hotel on Mars. Since technical debt is real, that makes him the world's poorest person by a huge margin.
Also, since technical debt is real you can deduct it from your taxes just like any other debt, right? I mean we wouldn't want the IRS to be unaware that you owe more than 100% of the time and money you invested in building a solution so far, since it could lead them to want to tax you based on what you earned, which just plain isn't fair. They should only tax any remaining money after you have engineered a veritable Taj Mahal of code quality such that there isn't any remaining conceivable improvement. After all, you know what another word is for conceivable improvement: technical debt.
Simple debt is a poor metaphor. Jenga tower is a better one. An even better one is "Jenga tower of subprime debt" [1]. Xkcd nails it as usual: https://xkcd.com/2347/
The author writes:
> "To their credit, I came in when the code was like a crumbling Jenga tower"
Structurally bad systems look a lot like that. Small misalignments and nonlinearities compound to make the structure vulnerable to a sneeze.
Relatedly, I feel it is not so much what % was applied, but what targets were de-risked. Things that tend to matter include mean time to recovery, time to market from _planning_ to deployment (not just commit to deployment), defect rate (e.g. hotfixes and rollbacks). If the superstructure is good, one can start with cheap panel walls and gradually swap them out for nicer ones with better properties.
teams should push for a 70/30 feature to tech-debt ratio. product is so good at feeding the dev machine, devs rarely get time to evaluate the landscape and push back.
Since the problem was the culture of continually pushing half baked features in the first place, the rule was quickly corrupted: people would design a good system, throw anything that’s not required for a POC into the tech debt backlog and deliver a barely functioning version.
“This is a technical debt task” was used to prevent everything that wasnt new Features taking time of the other 90% of the sprint.
Basically, if you assign a block of time to quality, you risk people taking that as an excuse to not focus on quality outside that block.