Hopefully, you have a monorepo or something with similar effects, and a lack of fiefdoms. In that case, if the current way is undocumented and/or inconsistent, you make it better before or while adding in your new approach. If there are 4 ways to do the same thing and you really want to do it a different way, then replace one of those ways with your new one in the process of adding it. For extra credit, get to the point where you understand why you can't replace the other 3. (Or if you can, do it! Preferably in followups, to avoid bundling too much risk at once.)
A lot of inconsistency is the result of unwillingness to fix other people's stuff. If your way is better, trust people to see it when applied to their own code. They probably have similar grievances, but it has never been a priority to fix. If you're willing to spend the time and energy, there's a good chance they'll be willing to accept the results even if it does cause some churn and require new learning.
(Source: I have worked on Firefox for a decade now, which fits the criteria in the article, and sweeping changes that affect the entire codebase are relatively common. People here are more likely to encourage such thinking than to shoot it down because it is new or different than the status quo. You just can't be an ass about it and ignore legitimate objections. It is still a giant legacy codebase with ancient warts, but I mostly see technical or logistical obstacles to cleaning things up, not political ones.)
Thats not how software engineering works in a business setting though? Not a single company I have been in has the time to first fix the existing codebase before adding a new feature. The new feature is verbally guaranteed to the customers by project managers and then its on the dev to deliver within the deadline or you'll have much greater issues than a inconsistent codebase.
I'd love to work in a fantasy company that allows for fixing legacy code, but that can take months to years with multi million line codebases.
> I'd love to work in a fantasy company that allows for fixing legacy code
You're not supposed to ask. It's like a structural engineer asking if it's okay to spend time doing a geological survey; it's not optional. Or a CFO asking if it's okay to pay down high interest debt. If you're the 'engineer', you decide the extent it's necessary
No. You discuss it with your manager, and you do it at the appropriate time. Having both created, refactored and deleted lots of technical debt over the past 25 years, trust me: you just don't get to go rogue because "you're the engineer". If you do that, it might turn into "you were the engineer".
What if you spend a week or month refactoring something that needs a quick fix now and is being deleted in 1-2 years? That's waste, and if you went rogue, it's your fault. Besides, you always create added QA burden with large refactoring (yes even if you have tests), and you should not do that without a discussion first--even if you're the founder.
Communicate with your manager and (if they agree) VP if needed, and do the right thing at the right time.
> No. You discuss it with your manager, and you do it at the appropriate time.
Sure, if you're not sure if it's the right thing to do, talk to your manager or TL. A good engineering manager can help.
If your manager "would never allow" it, they're not a good manager. Even for jobs much more menial than engineering, a good manager recognizes that autonomy/trust are critical for satisfaction and growth.
If you're working someplace where you're "not allowed" to make the changes you "wish you could," you're doing yourself a disservice. Find someplace where you're not only "allowed," but expected to have (or develop) the judgement required to make these decisions.
To be clear: "the business" expects (and in the medium/long term requires) engineers to make these decisions themselves. That is the job
It's why Scotty was always giving longer estimates that Kirk wanted, but Kirk was also able to require an emergency fix to save the ship.
The estimate was building in the time to get it done without breaking too much other stuff. For emergency things, Scotty would be dealing with that after the emergency.
If your captain is always requiring everything be done as an emergency with no recovery time, you've got bigger problems.
Thats also not applicable in a business setting. If you have multi million line codebase, you simply cant refactor within reasonable time. Also refactoring can cause issues wich then need further fixing and refactoring.
If I touch code that I am not supposed to touch or that does not relate to my direct task I will have HR talks. I'd be lucky to not get laid off for working on things that do not relate to my current task.
The Linux kernel is a multi-million line codebase, and refactoring still happens when needed. Let's not extrapolate from your limited data points as if they are representative for all codebases.
Totally no. Structural engineers also have to consider real life constraints, including cost. We are talking about working with existing structures, it's too late for geological surveys.
That totally depends on the surrounding configuration. If land subsidence was as common as security patches, they would totally be doing monthly surveys.
Structural engineers also don't commonly change structural features after initial delivery; realistically, I would expect changing a two-lane bridge to a four-lane bridge to be more expensive than constructing a four-lane bridge where none exists.
And then at some point the codebase becomes so unusable that new features take too long and out of frustration management decides to hire 500 extra programmers to fix the situation, which makes the situation even more slow.
As I understand, there is a balance between refactoring and adding new features. It’s up to the engineers to find a way to do both. Isn’t it also fair if engineers push sometimes back on management? Shouldn’t a civil engineer speak up if he/she thinks the bridge is going to collapse with the current design?
Often the problem with companies running the Feature Factory production treadmill too long is you have code supporting unused features and business logic, but nobody knows any more which features can be dropped or simplified (particularly after lots of employee churn and lack of documentation). So the problem is not so much technical debt, but product debt.
You can refactor, but you're also wasting time optimizing code you don't need. A better approach is to sit down with rest of the company and start cutting away the bloat, and then refactor what's left.
I was involved with a big rewrite. Our manager had on his desk the old system with a sign "[managers name]'s product owner". Nearly every time someone wanted to know how to do something the answer was load that old thing up and figure out what it did.
Eventually we did retire the old system - while the new code base is much cleaner I'm convinced it would have been cheaper to just clean that code up in place. It still wouldn't be as clean as the current is - but the current as been around long enough to get some cruft of its own. Much of the old cruft was in places nobody really touched anymore anyway so there was no reason to care.
> while the new code base is much cleaner I'm convinced it would have been cheaper to just clean that code up in place
I saw one big rewrite from scratch. It was a multi-year disaster, but ended up working.
I was also told about an earlier big rewrite of a similar codebase which was a multi-year disaster that was eventually thrown away completely.
I did see one big rewrite that was successful, but in this case the new codebase very intentionally only supported a small subset of the original feature set, which wasn't huge to begin with.
All of this to say that I agree with you: starting from scratch is often tempting, but rarely smooth. If refactoring in place sounds challenging, you need to internalize that a full rewrite will be a few times harder, even if it doesn't look that way.
I stayed at a place that was decades old, in part to decipher how they’d managed to get away with not only terrible engineering discipline but two rewrites without going out of business. I figured it would be good for me to stick around at a place that was defying my predictions for once instead of fleeing at the first signs of smoke. I’ve hired onto places that failed before my old employer did at least twice and I feel a bit silly about that.
I wasted a lot of my time and came away barely the wiser, because the company is spiraling and has been for a while. Near as I can figure, the secret sauce was entirely outside of engineering. If I had to guess, they used to have amazing salespeople and whoever was responsible for that fact eventually left, and their replacement’s replacement couldn’t deliver. Last I heard they got bought by a competitor, and I wonder how much of my code is still serving customers.
> I saw one big rewrite from scratch. It was a multi-year disaster, but ended up working.
90% of large software system replacements/rewrites are disasters. The size and complexity of the task is rarely well understood.
The number of people that have the proper experience to guide something like that to success is relatively small because they happen relatively rarely.
> "I'm convinced it would have been cheaper to just clean that code up in place"
Generally agreed. I'm generally very bearish on large-scale rewrites for this reason + political/managerial reasons.
The trick with any organization that wants to remain employed is demonstrating progress. "Go away for 3 years while we completely overhaul this." is a recipe for getting shut down halfway through and reassigned... or worse.
A rewrite, however necessarily, must always be structured as multiple individual replacements, each one delivering a tangible benefit to the company. The only way to stay alive in a long-term project is to get on a cadence of delivering visible benefit.
Importantly doing this also improves your odds of the rewrite going well - forcing yourself to productionize parts of the rewrite at a a time validates that you're on the right track.
Part of our issue with the rewrite is we went from C++ to C++. For an embedded system in 2010 C++ was probably the right choice (rust didn't exist, though D or Ada would have been options and we can debate better elsewhere). Previous rewrites went from 8 bit assembly to C++, which is the best reason to do a rewrite: you are actually using a different language that isn't compatible for an in place rewrite (D supports importing C++ and so could probably be done in place)
Rewrites are much like any act of self improvement. People think grand gestures and magical dates (like January 1 or hitting rock bottom) are the solution to turn your life around. But it’s little habits compounding that make or break you. And it’s new habits that kill old ones, not abstinence.
I worked with another contractor for a batshit team that was waiting for a rewrite. We bonded over how silly they were being. Yeah that’s great that you have a plan but we have to put up with your bullshit now. The one eyed man who was leading them kept pushing back on any attempts to improve the existing code, even widely accepted idioms to replace their jank. At some point I just had to ask him how he expected all of his coworkers to show up one day and start writing good code if he won’t let them do it now? He didn’t have an answer to that, and I’m not even sure the question landed. Pity.
The person who promised him the rewrite got promoted shortly before my contract was up. This promotion involved moving to a different office. I would bet good money that his replacement did not give that team their rewrite. They’re probably either still supporting that garbage or the team disappeared and someone else wrote a replacement.
That whole experience just reinforced my belief that the Ship of Theseus scenario is the only solution you can count on working. Good code takes discipline, and discipline means cleaning up after yourself. If you won’t do that, then the rewrite will fall apart too. Or flame out.
Whether you rewrite or refactor the code is not so much the point of my comment - it's more that you should first determine what you actually need, in consultation with the project stakeholders, get rid of whatever you don't need, and then you can decide whether you need to rewrite or refactor. Cutting away the bloat will give you a better perspective on that decision.
Personally, I would lean towards refactoring - a rewrite is the "declare bankrupcy" stage of technical debt and should only be considered in extremis. For example, the original codebase was written in ColdFusion and in 2025 you can't find any ColdFusion developers (or anyone in their right mind who wants to become a ColdFusion developer). But in any case, rewriting a trimmed down codebase is easier than trying to replicate features you don't need any more.
In the same way people go to their doctor or dentist or mechanic too late and prevention and sometimes even the best treatments are off the table, software developers (particularly in groups vs individually) love to let a problem fester until it’s nearly impossible to fix. I’m constantly working on problems that would have been much easier to address 2 years ago.
The issue is that management usually doesn't care. Personally I usually have about 3-4 days to implement something. If I can't deliver they will just look for more devs, yes. Quantity is what matters for management. New New New is what they want, who cares about the codebase (sarcasm). Management doesn't even know how a good codebase looks. A bridge that is missing support probably wouldn't have been opened to the public in the first place. Thats not correct for codebases.
Most shacks built in one's backyard do not pass any building codes. Or throwing a wooden plank over a stream somewhere.
Just like most software doesn't really risk anyone's life: the fact that your web site might go down for a bit is not at all like a bridge collapsing.
Companies do care about long term maintenance costs, and I've mostly been at companies really stressing over some quality metrics (1-2 code reviews per change, obligatory test coverage for any new code, small, iterative changes, CI & CD...), but admittedly, they have all been software shops (IOW, management understood software too).
That’s why consistent messaging matters. If everyone agrees to make features take as long as they take, then management can’t shop things around. Which they shouldn’t be doing but we all know That Guy.
When children do this it’s called Bidding. It’s supposed to be a developmental phase you train them out of. If Mom says no the answer is no. Asking Dad after Mom said no is a good way to get grounded.
Can't be held accountable for work conditions engineers dont have power over. If I dont have time to write tests, I cant be blamed for not writing tests.
Especially now with hallucinating bs AI there is a whole load of more output expected from devs.
Recently I got an email that some severe security defects were found in a project, so I felt compelled to check. A bot called “advanced security AI” by Github raised two concerns in total, both indeed marked as “high severity”:
— A minimal 30 LoC devserver function would serve a file from outside the current directory on developer’s machine, if said developer entered a crafty path in the browser. It suggested a fix that would almost double the linecount.
— A regex does not handle backslashes when parsing window.location.hostname (note: not pathname), in a function used to detect whether a link is internal (for statically generated site client-side routing purposes). The suggested fix added another regular expression in the mix and generally made that line, already suffering from poor legibility due to involving regular expressions in the first place, significantly more obscure to the human eye.
Here’s the fun thing: if I were concerned about my career and job security, I know I would implement every damn fix the bot suggested and would rate it as helpful. Even those that I suspect would hurt the project by making it less legible and more difficult to secure (and by developers spending time on things of secondary importance) while not addressing any actual attack vectors or those that are just wrong.
Security is no laughing matter, and who would want to risk looking careless about it in this age? Why would my manager believe that I, an ordinary engineer, know (or can learn) more about security than Github’s, Microsoft’s most sophisticated intelligence (for which the company pays, presumably, some good money)? Would I even believe that myself?
If all I wanted was to keep my job another year by showing increased output thanks to all the ML products purchased by the company, would I object to free code (especially if it is buggy)?
I’d rather say it’s an observation of real behavior. Customers of yours don’t buy code (unless it is your product) - they buy solutions to their problems. Thus, management and sales want to sell solutions, because that gets you paid.
Engineering is fulfilling requirements within constraints. Good custom code might fit the bill. Bad might, too - unless it’s a part of requirements that it shouldn’t be bad. It usually isn’t.
Companies die because nobody is willing to work on the code anymore.
If VCs ever came to expect less than 90% of their investments to essentially go to zero, maybe that would change. But they make enough money off of dumb luck not leading to fatal irreversible decisions often enough to keep them fat and happy.
One: Have you ever tried to take a narcissists' money or power away from them? You would think you were committing murder (and some of them will do so to 'defend' themselves)
Two: All the stuff we aren't working on because we're working on stupid shit in painful ways is substantial.
Or it becomes so unusable that the customers become disenchanted and flee to competitors.
If management keeps making up deadlines without engineering input, then they get to apologize to the customer for being wrong. Being an adult means taking responsibility for your own actions. I can’t make a liar look good in perpetuity and it’s better to be a little wrong now than to hit the cliff and go from being on time to six months late practically overnight.
> As I understand, there is a balance between refactoring and adding new features.
True, but has drifted from the TFA's assertion about consistency.
As the thread has implied, it's already hard enough to find time to make small improvements. But once you do, get ready for them to be rejected in PR for nebulous "consistency" reasons.
They don't think they have the time, but that's because they view task completions as purely additive.
Imagine you're working on this or that feature, and find a stumbling block in the legacy codebase (e.g., a poorly thought out error handling strategy causing your small feature to have ripple effects you have to handle everywhere). IME, it's literally cheaper to fix the stumbling block and then implement the feature, especially when you factor in debugging down the line once some aspect of the kludgy alternative rears its ugly head. You're touching ten thousand lines of code anyway; you might as well choose do it as a one-off cost instead of every time you have to modify that part of the system.
That's triply true if you get to delete a bunch of code in the process. The whole "problem" is that there exists code with undesirable properties, and if you can remove that problem then velocity will improve substantially. Just do it Ship of Theseus style, fixing the thing that would make your life easier before you build each feature. Time-accounting-wise, the business will just see you shipping features at the target rate, and your coworkers (and ideally a technical manager) will see the long-term value of your contributions.
I am a solo dev for my company which is a semi profitable startup. Before I was hired the codebase was built by a hobbyist and remote workers. I was hired due to a language barrier with the remote staff. I barely have 4 years of experience so I really really dont have the time to fix shit. Currently I have to ship reckless without looking back. Thats upcoming tech companies for ya, will get worse with AI.
> I barely have 4 years of experience so I really really dont have the time to fix shit.
I am confused. In my first few years of my career, I did lots of refactoring because it helped me to learn different codebases and learn the skill of refactoring. Your experience is somewhat unrelated to your ability to fix old code. Desire is required.
New features are the right time to refactor. If you can't make the code not complete shit you don't have time to add the feature. Never refactor code to make it prettier or whatever, refactor it when it becomes not-fit-for-purpose for what you need to do. There's obviously exceptions (both ways) but those are exceptions not rules.
My company didn't even have time to keep the dependencies up to date so now we are stuck with Laravel 5 and Vue 2. Refactoring/Updating can be an incredible workload. Personally I'd say rewriting the whole thing would be more efficient but that's not my choice to make.
If you have plenty of time for a task, I fully agree with you.
I was in an organisation that made decent money on a system built on Laravel 3, I think. The framework was written in an only static classes style, which they over ten years had run with while building the business so everything was static classes. Once you have a couple of million lines of that, rewrite is practically impossible because you need to take the team of two OK devs and a junior off firefighting and feature development for years and that will hurt reputation and cashflow badly.
My compromise was to start editing Laravel and implementing optimisations and caching, cutting half a second on every request within a month of starting, and then rewriting crude DIY arithmetic on UNIX epoch into standard library date/time/period functions and similar adjustments. I very openly pushed that we should delete at least two hundred thousand lines over a year which was received pretty poorly by management. When I left in anger due to a googler on the board fucking up the organisation with their annoying vision where this monster was to become "Cloud Native" on GCP credits he had, a plan as bad as a full rewrite, it only took a few months until someone finally convinced them to go through with deletions and cut LoC in half in about six months.
I don't think they do containers or automatic tests yet, probably never will, but as of yet the business survives.
I usually am in favor of a complete rewrite. I'd also prefer to not grow projects into multi million line monoliths. Just make multiple smaller ones that can interact independently with each other. Much simpler structure. Also safer in the long run.
You actually believe that? Distributed systems are simpler than monolithic? In PHP?
This business wouldn't exist if they attempted to follow your advice, because they weren't able and anyway didn't have the money to hire that many developers. There were a couple of subsystems they tried to implement the way you suggest, e.g. one for running certain background jobs.
It was a database table with one row per type of job and a little metadata like job status and a copy of the input. They started jobs by sending a HTTP request. This was a constant source of manual handling, because things started jobs and then crashed and never reset the status and things like that. You could respond that they should have used a message queue instead and so on, but the thing is, they didn't know how to build reliable distributed systems. Few developers do.
It's a bit of a non-issue in this context. If you don't have a mono-repo, you should maintain reasonable consistency within each repository (and hope they're consistent between each other, but that's probably less important here).
> A lot of inconsistency is the result of unwillingness to fix other people's stuff
Agree, so we find it best to practice "no code ownership" or better yet "shared code ownership." So we try to think of it all as "our stuff" rather than "other people's stuff." Maybe you just joined the project, and are working around code that hasn't been touched in 5 years, but we're all responsible for improving the code and making it better as we go.
That requires a high trust environment; I don't know if it could work for Firefox where you may have some very part-time contributors. But having documented standards, plus clang-format and clang-tidy to automate some of the simpler things, also goes a long way.
> That requires a high trust environment; I don't know if it could work for Firefox where you may have some very part-time contributors.
Ironically, that's why it works for Firefox. Contributors follow a power law. There are a lot of one-shot contributors. They'll be doing mostly spot fixes or improvements, and their code speaks for itself. Very little trust is needed. We aren't going to be accepting binary test blobs from them. There are relatively few external contributors who make frequent contributions, and they've built up trust over time -- not by reporting to the right manager or being a friend of the CTO, but through their contributions and discussions. Code reviews implicitly factor in the level of trust in the contributor. All in all, the open nature of Firefox causes it to be fundamentally built on trust, to a larger extent than seems possible in most proprietary software companies. (There, people are less likely to be malicious, but for large scale refactoring it's about trusting someone's technical direction. Having a culture where trust derives from contribution not position means it's reasonable to assume that trusted people have earned that trust for reasons relevant to the code you're looking at.)
There are people who, out of the blue, submit large changes with good code. We usually won't accept them. We [the pool of other contributors, paid or not] aren't someone's personal code maintenance team. Code is a liability.
> But having documented standards, plus clang-format and clang-tidy to automate some of the simpler things, also goes a long way.
100% agree. It's totally worth it even if you disagree with the specific formatting decisions made.
> All in all, the open nature of Firefox causes it to be fundamentally built on trust, to a larger extent than seems possible in most proprietary software companies.
Nice! We're still small so we somehow can keep that level of trust, but I always worry about how things may change for the worse as we grow. Mimicking the open source model as much as we can, even within a small private company, has worked well for us so far.
> > ... clang-format and clang-tidy to automate some of the simpler things, also goes a long way.
> 100% agree. It's totally worth it even if you disagree with the specific formatting decisions made.
So true! 5-6 years ago we had to make open source contributions to both clang-format and clang-tidy for several months to get them to support closer to our preferred style before we could get the "ok, close enough" buy-in across the company to implement automated formatting. (Mostly bug fixes for evidently rare flag combinations, but also a few small new features.)
In retrospect it was completely unnecessary - simply relying on automated formatting is sooo much better than any specifics of the formatting. I'm still glad we did though, as it made both tools better. We earned the maintainers' trust with a few early PRs, and remained active contributors for a while, but haven't contributed much lately.
I agree, but this presupposes a large comprehensive test suite giving you enough confidence to do such sweeping changes. I don't doubt Firefox has it, but most (even large, established projects) will not. A common case I've seen is that newer parts are relatively well covered, but older, less often touched parts don't have good coverage, which makes it risky to do such sweeping changes.
> Hopefully, you have a monorepo or something with similar effects, and a lack of fiefdoms. In that case, if the current way is undocumented and/or inconsistent, you make it better before or while adding in your new approach.
Unfortunately, this is how you often get even more inconsistent codebases that include multiple tenures' worth of different developers attempting to make it better and not finishing before they move on from the organization.
> In that case, if the current way is undocumented and/or inconsistent, you make it better before or while adding in your new approach.
Sometimes, but oftentimes that would involve touching code that you don't need to touch in order to get the current ticket done, which in turn involves more QA effort.
A lot of inconsistency is the result of unwillingness to fix other people's stuff. If your way is better, trust people to see it when applied to their own code. They probably have similar grievances, but it has never been a priority to fix. If you're willing to spend the time and energy, there's a good chance they'll be willing to accept the results even if it does cause some churn and require new learning.
(Source: I have worked on Firefox for a decade now, which fits the criteria in the article, and sweeping changes that affect the entire codebase are relatively common. People here are more likely to encourage such thinking than to shoot it down because it is new or different than the status quo. You just can't be an ass about it and ignore legitimate objections. It is still a giant legacy codebase with ancient warts, but I mostly see technical or logistical obstacles to cleaning things up, not political ones.)