It is absolutely true that some hot-path code needs to be mangled into an ugly mess to reach performance requirements. The problem is that I have encountered people who somehow take this as a blanket justification for writing unreadable code, and who operate on a false dichotomy about a choice between readable code and performant code. It is important to keep in mind that:
1) Most code, i.e. at least 80% of the code in a codebase, will never be a performance hotspot and does not need specific optimizations (i.e. as long as the code does not do stupidly inefficient things, it's probably good enough).
2) Even in performance hotspot codepath, you should not write unnecessarily hard to read code unless it is strictly necessary to achieve required performance.
In both cases, the key point is to benchmark and profile to find the specific places where the ugly hacks need to be introduced, and do not introduce more than is strictly necessary to get the job done.
I definitely agree on the false dichotomy between performance and readable code.
> 1) Most code, i.e. at least 80% of the code in a codebase, will never be a performance hotspot and does not need specific optimizations (i.e. as long as the code does not do stupidly inefficient things, it's probably good enough).
The hard part is knowing what "stupidly inefficient things" are. If you never do excessive optimization, its easy to drift towards less efficient over time because your baseline for what performance is possible slows down. Knowing how to do performance optimization that is not worth it and knowing what the best level of efficiency to shoot for is the mark of a good engineer.
Yeah, absent an effort to speed up a code base, it will tend to get slower, unless the team has a culture of performance, everyone is looking for easy performance wins and taking them, and common performance mistakes and getting rid of them.
To do that the team needs experience with performance, and most of internet programmer culture is just to lecture people about how programmers are cheaper than hardware if anyone asks or talks about performance. So many people don't get that experience.
This is very true. The misuse of the "premature optimization" mantra and the thought that valuable programmer time should never be spent on performance have done a disservice to a generation or two of developers. To the point that so many developers don't even believe performance has any significance and the cloud just magically makes it fast.
Spending a week optimizing some function to shave off a few seconds may be totally worth it, just depends how often it runs. If it is a code path being used by millions of people, it can quickly become very valuable.
Cost is another factor. Sure you might be able to effortlessly spin up dozens or hundreds more instances on demand to handle the load because the code is so slow one instance can barely handle a few dozen requests per second, but you're paying for that. Make an instance capable of handling a few thousand rps and suddenly you've save very real money.
And battery life on all portable devices also benefits from more efficient CPU usage.
> Yeah, absent an effort to speed up a code base, it will tend to get slower, unless the team has a culture of performance
My experience with JVM over the last few years says otherwise. Most teams I have come across try too hard to achieve "performance" where very simple idiomatic code would have sufficed. Idiomatic code improves in performance every release. Instead they picked up some dogmas based on JVMs of the past and their "peformance culture" is a farce.
Funnily, what I've seen most frequently with the JVM is most optimizations are on the order of "You should have used a HashSet here, not a List that you keep sorting and removing duplicates from!"
The majority of performance issues I've ran into aren't "Oh, I need just the right structure of code to make things go fast" but rather "I used a Map<String, Object> instead of a PoJo, why is my code slow?"
I think a big problem is the "don't prematurely optimize" has been taken to mean "never think about how your algorithm will perform". It allows someone to write an n^2 or n^3 algorithm when the n algorithm is just as readable (usually moreso!)
The problem with all this is that performance optimization is nothing without measurement. How many items are in your list / map in the regular case? Do you know? And if the number is small, how fast does Map go compared to a list with that many elements?
Oh this function is slow? How often is it called? If it’s only called once, maybe it’s fine.
Performance optimization work should be guided by real data, and knowledge of your users. What features matter the most? Are your users on network constrained devices? Are they running out of disk space? Are GC pauses a problem? What is the normal / maximum size of the input to your software?
Without knowing this stuff, you can’t focus your attention on the right things re: performance. You’ll end up spending time and energy making the wrong things fast with marginal benefit.
I'd say significantly reducing big-O complexity (ex: N^2 to NlogN) is worth it without measurement. That's because even if it is not significant during your tests doesn't mean it won't be a catastrophe later as N increases, maybe even a vulnerability. Using the best algorithm in term of big-O removes a potential problem.
Keeping the suboptimal algorithm is actually the hard optimization path, because it requires you characterize your data and maye keep a note somewhere explaining why you chose that algorithm and its limitations.
I loathe big-O arguments, and so will happily disagree with this advice.
Big-O is by definition only useful when N is large/very large. In many problem spaces, N is bounded, or ought to be bounded at e.g. the input validation level.
That doesn't even capture all the details that big-O glosses over, like how many ops are actually executed, cache coherency, etc etc.
Prefer simpler code until profiled; that's it. Once a bottleneck is identified, big-O might be a useful tool for discussing improvements, far behind just measuring it.
This all presupposes that reducing big oh increases code complexity. In my experience, that's simply not true. Unless you are writing C without libraries, you are almost certainly dealing with a language rich in datastructures that easily reduce big oh notation while clearly communicating intent.
A dictionary will generally beat storing values as key/value in a list, de-duplicating and linearly searching on the key. If you don't believe that, then go ahead and pollute your codebase with List<Pair<Key, Value>>. See how fast that will be.
Heck, go ahead and use and manage a Pair<Key, Value>[]. After all, can't prematurely optimize by using a data structure that automatically manages growth.
Maybe just store everything as a giant String. You can just keep concatenating new values and searching for others in it. Why let those pesky types get in the way of easy to understand string manipulation and regexs? After all, don't prematurely optimize.
Hey, perhaps you should always just bundle and distribute perl with your applications and fork off new instances of perl since it has such nice regex capabilities. After all, don't prematurely optimize.
Have you measured that performance? Halt before you say it's a bad idea, because obviously we need to profile everything and make sure that starting new perl processes are actually slow. Can't be prematurely optimizing.
> Unless you are writing C without libraries, you are almost certainly dealing with a language rich in datastructures that easily reduce big oh notation while clearly communicating intent.
Just to name it, I’m increasingly finding myself in positions where off the shelf data structures don’t solve the problems I have. I’m writing a novel CRDT for collaborative text editing. Some data structures I’ve needed:
- A generic run-length encoded vector that I can append to and binary search.
- A generic run-length encoded b-tree which indexes by the current total length of the sub tree.
- A gap buffer which stores run-length encoded items. Items can be split by a freshly inserted item. And each item has an array of states with a size set when the data structure is created. I might end up moving to a skip list of gap buffers.
Needing a lot of custom data structures is pretty rare. And if I was forced to, I could probably cobble together something that would perform well enough on top of the built in container classes. But doing it all custom is fun. My performance results justify the approach - my system performs orders of magnitude better than any other implementation.
> A dictionary will generally beat storing values as key/value in a list, de-duplicating and linearly searching on the key. If you don't believe that
I don't believe that, although "generally" is doing heavy lifting here. It depends on the key, value, number of kv, how good the hash algo is, etc. Yeah, standard impls are pretty good nowadays. If you don't want to bother understanding the problem space fully (which like you said is fine, otherwise we'd get nowhere), use a standard tool to solve the problem.
So yes, by all means use a hash map if it makes it easier to read/reason about. Hash maps after all are a common tool for this. Just don't start with the big-O crap.
the examples you are responding to are not trading off large-n performance against small-n performance or performance against some other virtue such as clarity
rather, they are unnecessarily slow for all n and also unnecessarily complicated to understand and maintain
> The problem with all this is that performance optimization is nothing without measurement.
Not what I'm arguing. I agree. When making code go fast, you should do it in terms of what the profiler tells you to fix.
However, when you are writing code, it is not wasted time to use the right data structures for the problem.
I've literally seen code like this
public void addItem(List<Foo> foo, Foo item) {
if (!foo.contains(item)) {
foo.add(item);
}
}
Is that code easier to read or understand than
Set<Foo> foo = new HashSet<>();
foo.add(item);
No (and that's the change I made).
I contend that the earlier version was bad code. The dev wasn't thinking about runtime efficiency, they just liked lists and kept using them. This kind of thinking is fairly common (particularly with young devs that aren't familiar with what's available).
And this is where the "No, never ever think about performance while writing code" that you are espousing is, IMO, detrimental to the craft. It is a basic question "Is there a data structure that meets the need of this code" that "PREMATURE OPTIMIZATION EVIL!!!!!" gets in the way of that sort of consideration.
It's sloppy development work.
The only time you SHOULD do the first solution is when you've found that in a tight loop where this method is called the items are limited and cache consistency ends up trumping algorithmic complexity. That is, data should back using a n^2 algorithm over an n algorithm, not the other way around. Especially when the difference in both solutions is a different datastructure.
But further, the code's intent is clearer in the second solution. So it's not only (generally) faster, it also does a better job of communicating intent with future developers. win-win.
I hate that so many devs cargo cult here.
The full quote is
> Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
Knuth wrote that in an era when premature optimization meant breaking out some hand crafted assembly which obscured the intent of the optimization. He was not talking about optimizations which improve debugging and maintenance. He was talking about improvements like TFA OP posted.
The optimizations I'm talking about I do not call "premature". Just like it's not premature to use the SDKs sort function vs your own bubble sort implementation because "oh, don't prematurely optimize!". An optimization that makes code easier to maintain and debug is never premature.
> And this is where the "No, never ever think about performance while writing code" that you are espousing is, IMO, detrimental to the craft.
That’s not my position. I think we broadly agree. I believe most software that is fast was designed fast. And the way you get there is by understanding your problem space well, and benchmarking with real world data as you build. There are almost always ways to take advantage of knowledge of the data to make your code faster.
As an example, in some performance sensitive rust code I’m working on, I have a lot of sets of numbers. The sets are tiny - most of them only contain 1 element. 95% have 1 or 2 elements. I could use HashSet, but instead I’m using SmallVec<[usize; 2]> - which inlines up to 2 items in a list. It’s uglier and more complicated, but in this case it is much faster than the alternative. I know to use this approach because I know my input data.
I agree with other commenters in this thread. Big-O analysis is often misleading. It leads you to pick a good algorithm, but often at the expense of optimizations that could give you 1-2 orders of magnitude better performance if you go looking for them. Things like cache aware memory layouts, reducing allocations via arena allocators and object pools, SIMD, and fast paths in complex functions for common calls.
> To do that the team needs experience with performance, and most of internet programmer culture is just to lecture people about how programmers are cheaper than hardware if anyone asks or talks about performance.
But this is an absolute truth, isn't it?
I mean, what's more important to your business: avoid the need to launch an EC2 instance, or implement that user flow in one day instead of one week? Do you care if it takes 50ms to show your dialog box instead of 40ms? Do you really care if you pass all your objects by value, with a few deep copies, instead of passing everything by reference?
The truth of the matter is that in general all performance bottlenecks are caused by software architecture and algorithms, not "clean code" implementations.
Modern macos is pretty unusably slow on old apple hardware. The same hardware ran earlier versions of the OS just fine. I’m not sure what’s going wrong at apple, but I don’t think they have enough performance engineers working on the Mac.
I run a CPU monitor in my dash (and have for years). The efficiency cores are very often busy doing some rubbish tasks I didn’t ask for while my computer is idle. Things like reindexing files, or re-scanning my photos for faces or something like that. I’ve started to think of them as the CPU cores apple put in my computer to keep their useless internal projects from ruining their hardware.
Linux annoys me regularly, but at least my cores are almost totally idle whenever my hands aren’t on the keyboard.
Window server pegs CPU on older hardware doing any sort of trackpad gesture etc. Wasn't the case in Mojave and adjacent releases. Big Sur on are noticeably worse than the earlier releases. Disabling animations/transparency saves a bit, but it's still remarkably poor.
> your baseline for what performance is possible slows down.
This is a problem that absolutely plagues our field. If your project isn't some wrapper around some incredibly complicated math or handling files well into the GB range, there's no excuse for a desktop application to take noticeable time to do anything. Yet here we are, with it taking about a half second just to open the search bar on windows. What is it doing? Why does it take so long? Who knows?
> The hard part is knowing what "stupidly inefficient things" are.
There are things that are quite obvious non-critical in terms of performance.
One time I had a junior engineer insisting in a PR that we should use a few low-level performance tricks in a code because it was fast, and the code was to open a dialog box.
Also, in general performance bottlenecks are caused by architectural and algorithmic choices, not readability choices. If we're in a domain where, say, inheritance is an unacceptable performance bottleneck and we need to unroll loops and we can't extract functions because the cost of pushing the call stack is prohibitive... We are in an extremely performance-sensitive part of the code.
Yep, it's entirely possible to get into a scenario where you can't determine where things are inefficient, because the inefficiency is everywhere. Once you're in that spot, you're in for a bad time.
The problem is that if you don't know what is required to write fast code you won't design your program to be performant from the start. Lots of applications can be sped up dramatically by rearchitecting them, but you're not going to hit on the right architecture without thinking really hard about memory layout, data access, network usage, etc.
It's not really true that 80% of the code doesn't matter. 80% of the code might not matter the way it is currently designed, but that doesn't mean a different design isn't dramatically better. Benchmarking and profiling doesn't help with that. Real expertise is required and that is developed over time by people who care about improving performance and make it a priority.
80% does not matter, period. I mean 20% is a lot. Hotspots are - spots. Like 0,5% of your code. Not UI for example.
You know, when money is flowing in, you have to hire some good people, let them spend time on fixing hotspots and in some rare occasions, rearchitecture a part or maybe two. Probably won’t pay off, but if you are growing fast enough, it will pay 100x.
Also you’re decreasing all future development speed with all this unreadable shit everywhere, so beware of optimizing unnecessary stuff.
For seriously high performance systems like webservers or exchanges, you need to get the architecture right from the start. You can't rearchitect a part or two or solve design problems by making skilled hires later on. The 80% part is a distraction (it's also factually untrue that 80% of code doesn't matter. Some applications don't have hotspots, the performance impact is basically smeared all across the code. It really depends on the application. Sqlite found major speedups by implementing hundreds of little changes that were each almost undistinguishable from noise).
High performance doesn't mean unreadable. Microoptimizations can make code less readable, true, but they also give you the least amount of speedup (in the best case you can get several times speedup if you found some hotspot that can be vectorized, but that's pretty rare). You can get huge performance improvements by optimizing networking and system calls and doing that doesn't make much of an impact on readability.
Aside: this is the deeply unobvious optimisation video "Performance Matters" by Emery Berger: https://www.youtube.com/watch?v=r-TLSBdHe1A - they got a 25% speedup in SQLite in 2019!
Start at 9:30 and watch to 10:40 if you want to jump straight into the entré. Dessert discusses SQLite at 40:25 to 41:25.
Summary: A: Causal analysis is needed to discover opportunities for improvements - they developed a technique and tool. B: Modern CPUs have so many hidden causes of performance variation, that you require special tools to actually measure small performance gains. C: They found a surprising difference between -O2 and -O3 that implied overfitting (false reporting of performance improvement).
I presume these techniques are used in large software companies, but I am guessing many (most?) developers in smaller companies know little about the topic. I still find it unobvious.
As someone who has done performance work for most of my career, I strongly agree with what you've written here. If you have a program that isn't well optimized, it might be true that once you start profiling you can find some hot spots and easy wins, but that doesn't last forever. Earlier in my career I was working on Python web apps, now I write high performance C++ systems, but you can apply this equally to both domains.
Take for example a big and slow Python/Ruby/PHP/whatever web app, something that I'm sure a lot of people on HN have experience with. If you profile one of these apps you're probably going to find that it is slow because it's using an ORM that generates hundreds of SQL queries, the ORM creates tons of intermediate objects which put a lot of stress on the memory allocator and GC, and there will be an endless amount of code that is munging data to take it from one representation (e.g. whatever the ORM returned) to some slightly different representation (whatever the caller actually wanted). There might be a few queries that are particularly expensive, but once you've fixed those you're still going to be left with something big and slow with no obvious path forward to make the code faster. Furthermore the root cause of these problems may not be obvious when looking at profiles because things like memory allocation that are internal to the runtime of your interpreter are typically either not exposed by profiling tools, or if they are it's not clear what action can be taken to improve them.
Likewise if you have a C++ program that does a lot of unnecessary copying and memory allocation, the program is going to be slow because there's a lot of time spent everywhere and there's no one thing to fix. Look at a bottom up profile of a C++ program and see how much time is spent in string constructors, memcpy, etc. and unless you've been thoughtful about this stuff from the start what you find is probably going to be alarming. In fact, since a lot of copy constructors are inlined (including for STL types), with many profiling tools it might not even be obvious that copies are happening at all, since they won't show up in call stacks.
Not every program needs to be high performance, but if you start by writing a lot of code that is slightly inefficient everywhere then it's probably going to be impossible to make things fast if you change your mind down the line.
And not only does the ORM itself have big performance impact, your own code will be designed around the assumptions of this particular ORM, e.g if it is easier to fetch a few row objects from different tables using the ORM and then do some comparison computations in your relatively slow interpreted language rather than writing one fast SQL JOIN the former will almost always be the solution that ends up in production.
ORMs invites you to write bad code, instead of thinking ahead of what queries I need for this specific operation, you just start writing ORM based code aimlessly and try to make it work with the tools available in your language of choice.
And the ORM entities will eventually infect every part of your project and it will almost be impossible to remove them after the fact.
> For seriously high performance systems like webservers or exchanges, you need to get the architecture right from the start.
Agreed - I worked at a company that went through a very painful period of not coping with load.
They got there in the end by completely rewriting the core system - lots of people claimed the switch from an interpreted language to a compiled language but the reality was that the rewrite was a completely different architecture.
I often see death by a thousand cuts - everything is slow and there is no single obvious hotspot to fix. We should not assume that performance matters only for a small fraction of the codebase. Sometimes it is the case, sometimes not.
And UI is a bad example IMHO. UI with a high response time which is uncomfortable (or even frustrating) to use is probably the result of thinking that UI performance doesn't matter.
I think sometimes this exact point is lost in the context of a collection of programs that represent a system rather than simply in the context of a single program.
Truly bad UI response time is rarely because of a performance hotspot though. It's almost always unintentionally blocking the UI thread on network requests. I might get mildly annoyed using a sluggish UI, but I get perpetually annoyed whenever I'm doing almost anything on my phone and leave my house's wifi range.
It may be not performance hotspot by which we usually mean CPU intensive code but some other performance problem which typically happens when performance is ignored. Networks don't have infinite bandwidth, zero latency and 100% reliability but I see some people create software pretending that all above is true. It works sometimes (until it doesn't) but can work better had they tried to minimize dependency on remote hosts or make network interactions asynchronous.
My opinion is that performance as well as security is very hard to retrofit if it was not considered from the beginning. And if you have necessary knowledge it doesn't take much more effort to create software which is faster (or more efficient) and secure compare to software created by an ignorant developer.
I heard many times that one should write a program first not thinking about performance, then profile and optimize a few hotspots. But in my experience 2nd part (optimization) rarely happens and people who can profile and optimize usually also can write faster code from the beginning.
It is painful to see how people who don't know and don't want to know typical latency [1] and throughput numbers, who don't know about algorithm complexity and may other things design and implement very inefficient software.
> Benchmarking and profiling doesn't help with that.
I've learned that benchmarking and profiling is the _only_ way to write performant code.
I've seen in code review a number of examples of a very fancy algorithm being broken out, and asked, "You realize N is bounded to be at most 100 here?". Or, "you realize the thread overhead here for parallel processing is two magnitudes slower than serial data access on one thread?"
Humans are bad at intuitively understanding where the slow parts of code are. I've seen processing be improved to the point of impossible to grok, shaving 10ms of a processing piece that is 50ms long, only to then spend time blocking waiting for network transfers that require 10s.
I'm of the opinion the biggest performance improvements are usually in design and architecture. What if there were a design that avoided the need to do any network IO? In that case the 10s + 50ms would be a 50ms process, rather than a hard to grok "optimized" 10s + 40ms process. Simple code leads to simpler design, which is easier to reason about and spot the places where things like "this entire network round trip can be cut out", or "we are loading this data multiple times throughout this process, we can load it once", or "we are loading this data and then spending a lot of time querying "n+1", instead if we stored the data in this format with some pre-processing we'll avoid the "n+1" query."
To further rant, the emphasis of algorithms in coding interviews, people enjoying algorithms more than cleaning up crufty architecture - that is the root of a lot of bad software rather. In sum, it's almost always the design that is slow, rarely it's the algorithm. The profiling is key as it let's you know where things are actually slow. (Recently a colleague was trying to optimize a tight loop that processed 1.5M rows. To "optimize" memory usage, they converted all variables to static to 'save' memory and avoid GC pauses. This in effect did _nothing_, the compiler instead was going to inline all the variables anyways and the resulting bytecode was not going to have any extra variables in at all. Converting local variables to static actually made the memory usage just slightly worse. So, this 'optimization' did nothing but make the code worse. A quick benchmark would have shown that optimization having no effect (to really optimize memory usage, we updated the design to stream results to a file rather than keep everything in memory for a final dump to file at the very end). Another example, I once helped a team do performance work for a DB that they spent a year tuning. They did not keep track of any performance benchmarks, what changes did what improvement; and after a year had nothing to show except for a DB that would crash after a few minutes. Taking that over, starting everything over from scratch, benchmarking everything, the project was done a month later and was stupid fast.)
Disagree, once you know what you're looking for you can thread the needle pretty easily. I've worked in high-performance areas most of my career and it's pretty wild when I solve a leetcode problem for fun and can routinely get into the 99% percentile on speed and memory usage just from knowing what to avoid.
I think you are the golf ball balancing perfectly on an upside down bowl. Optimal, but an unstable solution. Most engineers don't yet know because they don't have enough experience (and most engineers haven't worked in high performance for most of their career), so they need the benchmarks and profiling.
Plus, benchmarks are good solely to be able to show your manager that yes, spending 3 weeks on that refactor was indeed useful. Engineers shouldn't need to have to do that, but it is often useful none the less.
it's important to make sure your code base isn't doing anything dumb, like O(n^2) iterations, or 1000 sequential RPCs instead of 1 batch RPC, or etc -- this is i guess your point about architecture
but assuming that bar is cleared, performance optimizations should only be accepted when accompanied by benchmarks/profiles that demonstrate their usefulness at the whole system level
In practice, most of the industry has been writing ridiculously inefficient code for decades, and I wish more people would pay attention to that 20%, or the 3% that Knuth described.
Or better yet, I wish some groups don't irreversible sacrifice performance with frameworks and architectures that are impossible to optimize, no matter how the code is written.
> I wish more people would pay attention to that 20%,
I agree with both of you - there's no reason why software should be as inefficient as it is, and there's also no reason why code should be as unreadable as it is, and those goals aren't mutually exclusive.
> I agree with both of you - there's no reason why software should be as inefficient as it is,
Except that it is, and it tends to be development speed.
For example, real world React apps tend to suffer a lot from performance issues, specially when compared with vanilla html+javascript or even the old but true server-side rendered HTML, but it also makes it trivial to implement and rewrite complex UIs that UX designers love to put together. Would it make sense to argue for performance options which lead to productivity hits?
> 1) Most code, i.e. at least 80% of the code in a codebase, will never be a performance hotspot and does not need specific optimizations (i.e. as long as the code does not do stupidly inefficient things, it's probably good enough).
That's true, however, I've seen enough code where the author used this argument as a justification to do stupidly inefficient things, such as building monstrous mountains of abstraction layers (or re-scan the entire codebase at runtime because why not? [1]) all covered by the "premature optimization is the root of all evil" mantra.
You can absolutely write code that is both opaque and inefficient.
> Most code, i.e. at least 80% of the code in a codebase, will never be a performance hotspot
Yes, and no. That's what makes it hard to write high performance systems, you really need people with solid experience in designing for performance.
Sure, always go for the low hanging fruit which are the hotspots identified by profiling and improve those.
But what often happens after that is that all the code is slow everywhere! So there are no big hotspots, profiling shows nothing egregious. Teams without high performance systems experience might conclude the code is about as good as it can get given the absence of hotspots. Seen this happen often. But there might be 10x or 100x improvements remaining.
80% seems very low, in my experience it's 1% or less of the code that is a hotspot.
On the other hand, we should talk about "doing stupid things", or having a bad design/architecture. For example, if you architect your application such that each web request is serviced by 20 micro-services, and those micro-services make 20 requests of their own...no matter how fast your code is, the application will be slow.
I suspect the 1% number is typical only of programs that haven't been optimized. Once people start to care about performance and address these hotspots, which are often actually quite low hanging fruit, the profile starts to become flatter relatively quickly.
This makes assumptions about what the performance requirements are.
In latency sensitive code, _all_ code needs to be optimized. It doesn't matter if your slow function is only called once - it adds N microseconds of latency when it doesn't need to.
If there's a choice of spending X hours on performance tuning, fix the thing which gives the biggest win. There's no sense spending time saving 3us when you could have saved 3ms.
And sure, once that's done, you can keep going to picoseconds, but there's only a limited number of application areas where that makes economic sense.
It's also important to consider the entire system: you can optimize your code to the picosecond level, occupying the time of expensive engineers for months, but perhaps that $500k is better deployed on a PCIe5 bus, or a better Ethernet card, or a kernel-bypass stack, or even shorter network cables (every 300mm is a nanosecond, after all).
optimizations always have to be data-driven, from profiles, starting at the slowest stuff and working your way down, until you satisfy your latency requirements
unless your slow function materially impacts your performance metrics in a way that you can reproducibly demonstrate, it's a mistake to optimize that code to shave off a few microseconds
In this video, David Gross says something along the lines of "if your data is laid out in a memory inefficient manner, it will be the bottleneck everywhere." If you're dependent on a database query for everything, that will be your bottleneck. If your in-memory structures are cache inefficient, that will be your bottleneck. But once you've fixed all of this (if you can), hot paths and whatnot matter
At the same time, one should not automatically assume that "clean code" is actually readable. I've had to deal with architecture astronauts of the Uncle Bob school of "clean code", and their code was anything but readable.
I think that's where the craft of programming comes in. I feel like after 30 years in the field I have a reasonable idea how and when to optimize but it's not something I learned from classes or textbooks.
I had a manager who would write the most hideous, impossible-to-debug code imaginable and always have some sort of microbenchmark as some kind of justification. At the time I was young enough in my career to just think he was wiser than I was, but upon reflection (and remembering some of the stuff he did involving FFI-ing C into Haskell all the time), I realized that he just didn't he didn't like people criticizing his code, though the first hint should have been when he refused to answer my questions about actual performance profiling of our codebase.
The problem is that so many practices we have gravitated to are actively harmful to performance. Worse, they are borderline harmful for maintenance. Specifically, some practices are better for larger teams than they are smaller teams.
Difficulty in that last is that the best practice for how to maintain code changes as you get older on the project. This is easy to see in code that is a bit of a mungled mess. But it is also visible on code that it is a beautiful stalled out mess of needing too much to get small contributions in.
> blanket justification for writing unreadable code
Well, ironically, the site itself is down (so the code must be beautiful?) but assuming I understand the content from the title, another thing that most of these "code quality doesn't matter" types overlook is that software changes over time, quite a bit. That's why it's called "soft"ware, it's supposed to be soft. Readable is changeable. If all the mattered were performance, we'd engrave the code path onto a circuit board.
Its true for services/applications. In a case you write library, others will import into their projects, you should always treat performance seriously imo.
The website seems to have gone down under load, if this is a snarky response to Casey's pro performance take previously, then the irony level is high!
I really wish people would stop pushing back on people who are interested or curious about how to make code perform better. Of course you don't want people scattering weird SIMD intrinsics all over mundane parts of your web backend in an unhinged attempt to shave microseconds off a response. But in my career I've never really seen people do this anyway.
What you do want though is programmers who have spent some time understanding how to structure code to perform well on modern hardware. Because often times that structure is NOT a mess, it may even be easier to work with that the usual idioms in your industry or language. Grouping data into adjacent chunks with arrays or array backed data structures has more benefits than just leveraging the L1 cache. It is sometimes also convenient.
I'd say it's orthogonal to Casey's take, which was about trading abstraction for (arguable) simplicity + performance. This was an article about aggressively optimizing a simple loop, thus trading simplicity for performance.
Given that it was from a consultancy advertising its optimization services, the irony of it going down is still pretty high, though.
What I didn't like about the "Clean code, horrible performance" video is that he talked about "clean" code as though it was this unjustified, useless way of thinking about programming. This is maybe something you have to think about if you want the most performant code possible, but I don't think it's a good mantra in all situations the way it's presented in the video. Number one priority should always be "understandable code". If you aren't always thinking about the best way to get your point across to future programmers (including yourself) you're going to have a bad time.
> he talked about "clean" code as though it was this unjustified, useless way of thinking about programming.
That was kind of the point. There is no empirical evidence showing that the "clean code" way of programming is any better in terms of cognitive load, but there is empirical evidence that it makes your program perform very poorly.
The video also did not aim to produce "the most performance code possible", which would certainly become a tangled web of arcane bit tricks and CPU minutiae. It was demonstrating what reasonable performance might look like, if you wrote simple code that the compiler can do a good job at optimizing.
In addition, he demonstrated very clearly that doing it the "clean" way was significantly worse in terms of being able to see that the optimizations he did were even possible, so, that observation argues that the "clean" principals not only make your code slow as shit, but ALSO harder to reason about.
I program in a similar style to Casey all day, every day, and I typically have no trouble going to code I wrote years ago and figuring out what's going on.
If that doesn't convince you that it's an unjustified and useless way of thinking about programming, I'm not sure what would.
We put a 'does this person understand assembly output' into our interview process.
They don't have to get it 100% right, but not knowing anything is a hard No.
Interesting! What kind of company and technology domain is it? Assuming you don’t want to get too specific.
My first reaction is that I like this idea, but maybe I’m biased because I feel I’d “pass the test” myself (and thus have one additional small way to differentiate myself from pure Leetcoders :)
In most of the environments I’ve worked (except maybe the games companies) I feel this would not have been widely considered an acceptable thing to ask in a generalist interview.
The blog post is effectively implementing memmem() or strstr() that searches a short string in a long string. If we are allowed to use GNU extensions, the cleanest solution is to call memmem(). Without memmem(), I would implement Boyer–Moore or Knuth–Morris–Pratt, which will be more scalable than the O(MN) implementation in the blog post. Time complexity matters.
Yes, this was my takeaway - it was a very poor choice of example, because a known faster algorithm will crush the sort of micro-optimizations done here and be more readable.
Or perhaps it was actually an incredibly good example, despite itself: a great illustration of precisely why jumping to micro-optimization isn't always the right solution.
This is also a problem where the algorithmically fastest solutions are also pretty much guaranteed to be faster, since they are equally cache-friendly and will almost always work out to fewer instructions.
That is not always the case, but this is one of the times when it is.
I would say this is not horrible code - or at least, it's "hollywood horrible". [1] Yes, it's not directly intuitive and probably would need some illustrating comments, but it's still reasonably close to the "naive" algorithm that you could quickly figure out how it works. More importantly, it's still localized, concrete and side-effects free: The code has a clear goal and accomplishes that goal completely inside that one function. The non-intuitive parts also have a clear reason.
I think actual horrible code is more something that uses global mutable state, has logic spread over half a dozen units or does have weird roundabout implementations - however not because of a particular reason but because of unclear goals, big-ball-of-mud architecture, evolving codebase etc.
I have long been in the habit of leaving in the old pre-optimized code, commented out, above the optimized code. I generally don’t think that performant code ends up all that mangled, but it costs ~nothing to leave the old code in a comment.
I picked this practice up from an older engineer who would set
SLOW = false
at the top of the file and then wrap old code in a
if (SLOW) { … }
block for the compiler to do dead code elimination on. (I found linters and compilers complained less about commented-out code, but he preferred his way so it would always have syntax highlighting.)
Although I agree that you must test optimized code for correctness, I think it should be approached very cautiously, as tests can become checkpoints that constrain optimization.
I had a memorable experience of that, actually - for a long time we had a Trello card titled “faster to operate on the bits instead”, and the team looked at that card fondly during development. After we started getting traction, it was finally time to optimize! We conscientiously wrote tests to ensure correctness, and then began an incredibly enjoyable informal hackathon. We blazed through every step, replacing the logic piece by piece with bit operations. This step was 2x faster, that step was 5x faster, and so on. A few steps were slower, which was a bit weird, but we were having so much fun shouting out our gains to each other across the room, racing to improve. By the afternoon we had optimized each step and gotten an order of magnitude improvement.
“Great stuff, team.”
“I thought it would have been faster.”
“Yeah, same.”
We did some evening profiling and noticed a lot of time being spent in a dozen or so functions with names like “toRepr”, “reconstitute”, stuff like that. Actually it was like most of the time - what!? And then we saw, at each step we converted to raw bits, performed the equivalent logic with fast bit operations, then converted back to the representation for handoff to the next step. Wait, why we are converting to and from bits like thirty times each run? That can’t be right. Realization strikes - right before most handoffs there was a commented-out line that ran out tests on the representation to ensure it was still correct.
Someone pointed out we wrote that test before we had done any investigation of performance (I remember it almost word for word actually, “When we wrote the test, all we knew was it would be faster in bits, but not how. And the test doesn’t even look at bits!”). So we ripped out all the conversions and stayed in bits from step to step. Without changing the bit logic it was now almost three orders of magnitude performance gain, basically running in milliseconds instead of seconds. Two OOM that we had earned, but left on the table!
Tests introduced arbitrary checkpoints and imposed irrelevant constraints on the problem, without us noticing until afterwards. We caught the problem pretty easily that time, but I do wonder how many other times we (or others) didn’t because it wasn’t as obvious. The service that accepted the output maybe should have been accepting raw bits instead of the representation. The service that gave us the input definitely should have been staying in the bits.
So, do test your optimizations for correctness, but be really careful. Tests can easily and subtly constrain your optimization space in major and arbitrary ways!
I haven’t used TDD much but when I did I found I had that issue, that I would imagine the code I wanted to write and then write tests to check that I had written that code (obviously not the intent of TDD). I got a little better when I thought about testing requirements. I actually used an old trick I’d learned from the head of a web dev shop - when he was struggling to elicit requirements from a client in a meeting, he would hold up a pencil and say “does this fix your problem? why not?” and then inverting their objections would produce requirements more fluidly. (Obviously you don’t stay on the pencil for long, but it puts the client in the right mental state of “looking for lack of capabilities in proposed solution” rather than the more usual “bullet point shopping list of desired delivery levels, SLAs, etc”).
> but he preferred his way so it would always have syntax highlighting
On that topic, probably one of the top-3 features from Clojure I miss in other languages, is having the option of making comments being a part of the language itself so everything that works on the code (linters, evaluation, syntax highlighting, etc) works in the comment as well. The `comment` macro really is a godsend in disguise as it's awfully simple implementation-wise. Its cousin `#_` is also a great tool. See https://clojuredocs.org/clojure.core/comment for more examples
It's interesting to think about the difference between isolated horrible code that sits in a function and does its own thing, vs architecturally horrible code that tends to spread outward and infect the rest of the program.
In cases like ECS (in gaming) or virtual DOM (in web), the "horrible" architecture comes out the other side and becomes almost clean again.
Unfortunately, there are many reasons to write horrible code.
I work in-house for a non-tech related company. I'm often the sole developer, sometimes I have help. I'm also the defacto DBA and systems administrator for the machines the software runs on. As the defacto DBA, I'm also roped into data analysis as well.
There are also certain events that are non-compromisable. These events take place whether we are ready or not. Having nothing is not really an option. That means shelving the project until next year. Because the next event must be prepared for.
So, at the end of the day, "done is best". If I have the time, I can go back and refactor everything into something better. But often, there's "the next thing".
Both writing an algorithm from scratch and copypasta lead to ineffective software engineering and computer science.
I would first look in the 5 book TAOCP for an algorithm recipe.
Then, I would look to other FOSS implementations that solved a particular problem before.
If you need to write something totally new, consider first why you must put dense engineering effort into something that may already exist. Do you really need to be solving this problem?
Because in order to achieve the highest performance, the only way to do it is draw a direct path from origin to the destination. Often the system is build around certain data structure to gain the maximum performance and cannot hot swap that part. And you don't need to pull everything from amateur FOSS software to bloat your project.
You sound like ChatGPT with hyperbolic, loaded language.
I wouldn't call the Python dict implementation amateur.
Furthermore, the only way to arrive at optimal implementations for various platforms is to benchmark them. You won't get that just guessing or labeling things you know nothing about "amateur".
clean, adjective
1. Free from dirt, stain, or impurities; unsoiled.
2. Free from foreign matter or pollution; unadulterated.
3. Not infected.
4. Honest or fair, or showing that you have not done anything illegal
Clean isn't a word that can be used to describe software. It just doesn't have any relevance to an observable, objective quality of software. There is no definition of what 'clean' code is; you can't show me code and say objectively "this is not clean", because there's no real definition of "clean code". It's a loan word, seemingly to indicate a lack of desirability or quality. And it also implies things that don't have anything to do with software. But there really is no concrete example or evidence of what clean software is. So it's nonsense.
We talk about a computer "science". But in reality it's not a science, it's a cargo cult. We use imprecise language with no concrete definition of what we're actually saying and doing. And we wonder why the end result is so poor.
I always know I'm going to get a coherent and relevant argument when someone starts with a dictionary definition.
Most creative trades share a definition of clean that expands on #2, where 'foreign matter' includes items, not detritus. If you leave a toothbrush on your desk the desk is not clean, even if the toothbrush is still new in the wrapper. It can also mean 'cleared and ready to begin a new task', which is a step that would precede any mis-en-place work one might engage in before properly starting the task.
Clean means something beyond that in software. It's fairly well agreed upon. It's not in the dictionary. That doesn't invalidate it.
Then could you share that agreed-upon definition with me? And also, exactly how much is "fairly well" agreed upon? 60%? So would 2/5 of people disagree when you call some code unclean? Are there classes where they teach kids in school how to identify unclean code? Are there instructions on how to clean it up?
And most importantly, is there any kind of scientific research that shows the effects of unclean code? Or are humans just overly concerned with appearances, and make up their own rituals and reasons for whatever becomes habitual?
I had to ask ChatGPT what is the meaning of "cargo cult" Notwithstanding the fact that most of us don't know how ChatGPT collects data and follow cue from its output:
"Originally a religious movement that emerged in Melanesia during World War II. The movement professed that Western technology and prosperity were result of supernatural powers"
I don't scoff at people using the phrase "clean code". I know what is conveying.
I do find ChatGPT supernatural.
So if I show you some code, you will be able to immediately point at the unclean code and tell me why it's unclean, and everyone else will agree with you?
The term clean is somewhat esoteric and leaves room for dissent. We also use the term "code smell" Which Wikipedia refers as: "Smells are certain structures in the code that indicate violation of fundamental design principles and negatively impact design quality" When people asked me about fundamental design principles I point them to S.O.L.I.D https://lostechies.com/wp-content/uploads/2011/03/pablos_sol...
This is a thing to think about in itself. One problem here is languages that have semantics hostile to significant data layout transformations. Practically to get there the language design and the data-optimizing compiler would have to co-evolve I think.
Dunno about "readibility" but I fully expect ChatGPT to produce vast swathes of code that nobody understands.
While I'll cop to being more skeptical than the average HN denizen about ChatGPT, this isn't actually cynicism about ChatGPT, but about human cognition. If we do have something that produced large swathes of mostly correct code, it won't be long before the humans using it aren't even checking it anymore. Same reason that cars can do a little bit of assisting, and if they could do 100% of the driving that might be safe, but 98% of the driving is not a very good idea at all.
In 20 years, someone will ask CodeGPT 8.3 to explain what some code does, and it'll give a perfectly understandable explanation back, except the human still won't understand it because the human doesn't actually understand how computers work anymore. They'll think they understand it, though.
That's why i think the copilot and chat gpt hype will end up as one big balloon and in ten years time we will have a pile of generated code no one really understands anymore that needs to be fixed. Kind of like after the out sourcing adventure people had in tech in the early 2010s or so.
ChatGPT has nothing to do with it. You can have code that is both easier to read and higher performance, it's just than nobody has made it a goal in life to show the rest of us how to do it. If I ever get fed up with writing code for a living I might sit down and figure out how to communicate it. There's dribs and drabs in my comment history, usually under topics of performance.
The problem is that they've chosen a micro benchmark for this argument. The toy-est of toy problems. Yes, you have to make tradeoffs in these cases, but toy problems aren't really a good communication medium. If you want to process a thousand records of data in a decent amount of time, there are lots of strategies that move you toward achieving both.
If you mean automatically optimising code, the reliability isn’t quite there (it can’t check its own code) and the limited context length means it might struggle to reason about the codebase as a whole
1) Most code, i.e. at least 80% of the code in a codebase, will never be a performance hotspot and does not need specific optimizations (i.e. as long as the code does not do stupidly inefficient things, it's probably good enough).
2) Even in performance hotspot codepath, you should not write unnecessarily hard to read code unless it is strictly necessary to achieve required performance.
In both cases, the key point is to benchmark and profile to find the specific places where the ugly hacks need to be introduced, and do not introduce more than is strictly necessary to get the job done.