Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: What are the 20% tips that will get my code to performance 80% faster?
38 points by echo272 on Feb 1, 2015 | hide | past | favorite | 65 comments
I'm looking for resources that can improve the execution of my code dramatically. The 20% tricks that will get my code to perform faster 80% faster.


I wish I knew whose quote this was: "Use your intuition to ASK questions, not to answer them."

Specific to this topic:

Use a profiler to determine where you're spending your time. Don't guess; measure.

When you find a candidate area to work on, determine whether caching or a fundamental algorithm change could work. Those will usually get you far greater gains than optimizing code as-is.


I generally agree but there are a few caveats:

(1) A profiler won't tell you that there is an algorithm that is O(2n) while the one you're using is O(n^2). Only reasoning about your problem will do that.

(2) Most profilers won't tell you that much about cache effects, and those can be huge in some cases. This is more of an issue with tight-loop math-heavy code like codecs and crypto and primitive data structures and the like.

(3) Some code can be "uniformly slow" or "structurally slow everywhere." An example would be C++ code that over-uses inheritance patterns and defines every single method as virtual. In that case you are losing massive cycles to indirect fetch/call instructions, and a profiler will not tell you this. It will only show you the relative hotspots, not the absolute overall slowness of this type of code. Another example would be over-engineered code. A profiler will help you optimize down the problem areas in a pile of over-engineered spaghetti, but it will not help you make the overall code base more elegant to realize large overall gains. This is often a special case of point #1.


1. Yes, but if you use a representative size dataset and the profiler doesn't show that as a hotspot, there's no point in changing algorithms.

2. Agreed, but remember the value of the profiler is telling you, "You're spending 97% of your time in this small subset of your code" and your job is to conclude, "Hmm, I should look there for improvements and not at the whole rest of the program." It can't tell you what changes to make, but it keeps you from spending time optimizing what's in the 3% of runtime.

3. Agreed, with the caveat that most profilers that I've used did tell me the absolute overall slowness in addition to the relative slowness, but that wasn't the major thrust of your comment so I'll just observe that small difference and otherwise agree.


Would truly help to know what kind of code echo272 wants to run faster. Language and environment.


Thank you vardump. It's numerical linear algebra kernels on C, in a multi-core environment.


Code that you don't write consumes zero clock cycles.

Learn your language well, so that you can use data structures and builtin functions as they are intended, and you can expect them to be pre-optimized for you. Learn in principle what is optimal, and write in that fashion.

Know the documentation, and follow best practices that the docs recommend - when the compiler/language gets updated, you'll get free optimizations.

Use semantic names, none of this i, j, x, y stuff (unless you're writing totally abstract math) and don't use nebulous names like "helper," when you can be more precise. Better names help you think more precisely about your code and what you want it to do, so you can avoid unnecessary transformations, materializations, and calculations.

When you've finished the program (all the unittests and acceptance tests pass), then profile. Focus on the bottlenecks. In this way, you'll get your optimal Pareto performing code.


Code that you didn't write but that is being executed still consumes clock cycles. Standard library functions and data structures still have code that executes on the machine, you can't just handwave it away as consuming zero cycles just because you didn't write it.


I believe the intention was more Zen -- don't write code that isn't really necessary -- and the "use standard libraries" code was separate. (If you are using a language where the standard libraries are so poorly implemented that you can easily write better versions, you might want to consider using a different language.)

I would also add the cargo cult suggestion of "precompute and cache everything you can". Whatever can be done in an indexing process instead of a search process should be; whatever doesn't need to happen at serving time shouldn't.


> If you are using a language where the standard libraries are so poorly implemented that you can easily write better versions, you might want to consider using a different language.

Standard libraries are typically generalized for a wide array of use cases, you can easily write much faster implementation specialized to your needs if you have any idea what you are doing.

Sometimes standard libraries handle such ridiculous cases (e.g. many ES5 array methods and Function.prototype.bind) that your needs don't even need to be that special.


This is fairly specific optimization advice which applies to only a subset of people's code, but: Write your own SQL queries.

ORMs are notorious for writing inefficient queries once you get more complicated than selecting a single row from a simple table, and trying to optimize them without writing a SQL query will make your code all but unreadable. This isn't to say don't use an ORM, but if you need to optimize, your code speed and code clarity will be better off with raw SQL.

More generalized performance advice: Keep heap memory allocations to an absolute minimum. Heap access is more likely to result in cache misses, and the bookkeeping related to allocating and freeing memory can quickly add up (even if you're not doing it explicitly).

Globals and stack allocations are almost universally faster.

A resource which popped up on HN yesterday might offer a bit more insight on how to design for optimal performance: https://news.ycombinator.com/item?id=8978081


Log and time your SQL queries, don't assume they'll always be worse. Sometimes you get bad SQL generated because ... the tables/objects were not set up properly in the first place.

I'm an ORM-first person, and look for slow queries at certain points in development. Usually about 90+% of the queries from the ORM layer are fine and 10% need to be rerolled by hand for better performance.

The other thing about ORMs I've learned in the last few years is that often you're not really needing an object back in the first place, if you're just reading the results vs modifying/saving. Hibernate projections (IIRC) just give you back a map, and save the overhead of trying to create an object (or object graph) to hand back.


It's a bit unique to each situation, but a few topics that come to mind immediately:

* Profile & measure to narrow down where time is being spent

* Memoize repeated function calls

* Cache data that doesn't frequently change

* Minify web assets

* Gzip compress web output (a la mod_deflate)

* Load balance

* Shard

* Strategic database indexing

* Asynchronous workers when beneficial

* Use the correct data structures

* Minimize Disk I/O

^ I suppose the last few are architecture-related more so than code.

These topics are mostly specific to web, but, here is a "curated list of Web Performance Optimization"

https://github.com/davidsonfellipe/awesome-wpo

Since nobody has quoted Donald Knuth yet (and this quote sometimes gets bent to fit situations that it shouldn't):

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"


Use a profiler.

Make your code do less work.

Use less memory so more of your code/data fits in the CPU cache.

Understand the algorithms that you use, their strengths and weaknesses.

No magic tricks, just hard work.


Trust no one, not even yourself: Profile your code.

That said, if you aren't profiling already you are probably trolling us to get suggestion for your next blog article. ;-)

As others have mentioned, use better algorithms i.e. it makes no sense to speed up a bubble sort routine instead of using a quick sort approach.

A lot of speed improvements come from just not doing stuff in the first place vs. speeding up a naive or brute force approach.

For example, suppose you want to determine when something enters the frame of a camera image that is 640x480 pixels big.

The naive approach would be to compare every pixel of the frame to every pixel of the previous frame. The area of the frame is Width x Height: 307,200 pixels.

A more efficient approach is to process just the edges (perimeter) of the image/frame since nothing can enter the frame without crossing the perimeter first. The perimeter is 2 x Width + 2 x Height: 1080 pixels.

The 1st approach is 284 times slower (307,200/1080) and no amount of optimizing it is going to make it as fast as the 2nd approach.


If your pixels are a byte each and the data is 64-byte aligned in memory, you're still reading 640 * 2 + 128 * (480 - 2) = 62464 bytes from memory, not 1080. That's because you can't read less than a cache line, which happens to be almost always 64 bytes. Performance win could be a lot less than what you expected.


Respect CPU caches and locality. Keeping your data structures in cache or even packed together can result in a nice performance gain.

I've seen benchmarks where inserting into an array was significantly faster than inserting into a linked list. This can be surprising considering most algorithm texts I've seen say the complexity of the latter is better. From what I understand locality is part of the reason inserting into the array is faster.


Measure and profile the code will always be the best thing to do but for simple 20% effort tips I think it would be:

* Don't write your own data structure, stick to the standard library or, in some rare cases, popular and well-known 3-rd party lib.

* Always prefer a Hash or a Set over anything else.

* Beware of any nested loops.

* Have some local / external caches and try memoization.

* Try parallelizing or better, non-blocking wait for all I/O operations.


- Learn about big-O complexity.

- Learn the big-O complexity of your language's data structures.

- Benchmark places in your code that do lots of things with those data structures, especially loops. Write a couple variations with the structures with the best big-O complexity to find the fastest one for your use-case (it can sometimes be surprising).

- Conditionals in loops can hide all kinds of complexity. If you are computing the same thing each evaluation in a loop, compute it once before entering the loop instead and replace it.

- Try to do as much with the data that's in memory as possible, before replacing it with stuff further down the memory hierarchy. If your language let's you care about L1, L2 cache optimization, learn about that. Nothing will slow code down like having to move data up and down the memory hierarchy.

- If you have to operate on lots of data, look for ways to parallelize work on that data. Modern CPUs have lots of cores, single-threading only uses a fraction of your computational power.

- If you have complex conditionals, try to find ways to optimize and limit the branching: short circuit booleans, eliminate expensive to compute conditionals etc. Sometimes complex business rules can be rewritten in simpler ways that greatly reduce branch prediction hits.

- Try to squeeze as much performance out of a single machine as possible. As soon as you start sharding your code across multiple systems, your memory hierarchy problem becomes a network bandwidth problem, which is about as bad as you can get. If your problem fits on one machine, it can almost always be made faster than splitting it across n-machines.

- Spend time really thinking about your algorithms. I've seen all kinds of cases where a slow algorithm could be trivially sped up by taking a different approach.


I suggest you to read a book about algorithms. This one is pretty good: "Introduction to Algorithms" by Cormen and Rivest.


There is also a lot of great material on Coursera, MIT OpenCourseWare, etc. on algorithms.


I second that, but would add data-structures to it. They go hand in hand. Being familiar with whats available will let you choose the correct one.

Using the right data structure and algorithm for the job can your code significantly faster.


This was asked recently in Ask HN: https://news.ycombinator.com/item?id=8951241


Stop worrying about how well your code performs and focus on more important things.

Seriously.

I'm not saying that code performance isn't important; of course it is. It's just that if you're already fairly competent, then code performance is probably not your biggest problem. You're looking for a magic bullet that may not be there.

I'd be more concerned with:

- How easily can another programmer maintain your code? We all tend to believe that our programs will be just right when we finish, but the sad truth is that others will probably change them significantly over the years. Your wish to "squeeze a little more performance" out of your code may actually be counterproductive; no one will understand your clever optimizations and your code will become maintainable. Remember, you're writing for the next programmer as much as for the machine.

- How well are your databases designed? Do this well and you won't have to worry about optimizing your code as much. Take care of the microseconds so that you won't have to worry about the nanoseconds.

- How well are your variables named? Again, this doesn't affect performance, but boy does it ever help maintenance.

- DRY (Don't repeat yourself.) For many reasons, including performance.

- How precise is your definition of "what" to build? Building the wrong thing to run very fast is much less important than building the right thing to run fairly fast.

- Are you doing the right things on the client and the right things on the server? For example, if you're validating data on the client and building the UI on the server (under some conditions), then it really doesn't matter how fast it runs. You're probably doing it wrong.

- Are you especially careful about recursions, early exits, sorting, filtering, parsing, and other fairly standard stuff? Do these things right so you won't have to worry about performance. Do them wrong and all the fine tuning in the world won't help.

- When all else fails, find a peer reviewer whose opinion your respect. They'll probably find the stupid things your blind eye misses. You'll remember their feedback and probably never make that mistake again.

Have fun doing the important things well and stop worrying about fine tuning: a philosophy that has worked very well for me for years. Hope this thinking helps you too.


Ed is right in that all these things matter as much or more than making performant code but at the same time your question suggests that performance is your main issue at the moment. If that's not the case then please take Ed's advice to heart and invest your time in all of the above, only concentrate on performance if you are 100% convinced that is what will give you the best return on time invested.


> Stop worrying about how well your code performs and focus on more important things.

It's very naive to think that that's an acceptable tradeoff in every situation. Would you accept it if chrome/FF/<browser of choice> suddenly consumed an extra 30% of memory and CPU time after a single update, because the developers decided that the HTML rendering code wasn't readable enough? Or if you're writing code for cars that needs a response time in sub millisecond ranges to delay for hundreds of milliseconds before engaging traction control because the code wasn't well labelled, so it was rewritten?

There are plenty of cases where performance is absolutely critical, and it is the most important thing.


yep there are also a lot of situations in enterprise where you don't exactly know how your code will be used in near future (in <5 years it could grow to exponentially more customers using it compared to soft releases to select test customers).

Because of that I'd say.... it's not like I see a lot of people get raked over the coals for it, but there is definitely some organizational disappointment when a silly scaling bug crops up in production[1]. I think as a mature dev it is proper to internalize as many performance-related techniques as possible so that you can write more performant code with each subsequent project. Sometimes it doesn't even affect the amount of initial coding hours, it is just one of the freebie gains that comes with experience.

[1] Many more such bugs would probably occur if the company didn't occasionally get around to load-testing, so that should be part of CI or dev cycle if possible.


I like this list much better than much of this thread. A lot of programming doesn't NEED to be that efficient, and it's much better to learn the "soft" stuff and how to properly maintain projects


Have a test suite that is 95% unit tests. And structure it in a way that tests can run in parallel, across all cores.

FROM THE BEGINNING. Because it's nearly impossible to make this change after the fact.

Also, use a code profiler and commit the profile data to the repo! This will allow you to plot performance issues over time and watch for big upticks.

This alone will make your code orders of magnitude times better (and likely, faster). You will catch concurrency issues before they become a huge problem, your test suite runtime won't be as much as an impediment to your development process, and you will simply write better, more modular, more maintainable code by having to unit-test every component in isolation.


When writing if-statements that have an AND, always put the quickest executing condition first.

Don't use ORM's naively.

Be careful about what you call in a loop.

Those three alone account for most of the execution speed issues I've encountered. And they're all low-hanging fruit.


>When writing if-statements that have an AND, always put the quickest executing condition first.

This would matter a lot only if those if statements were in a bottleneck of the program, e.g. in a loop or nested loop that ran a high number of times. (Unless you are trying to squeeze out the last few cycles of performance). Or am I getting you wrong?


>always put the quickest executing condition first.

isn't it put the most probably true condition first?


when choosing between

  A and B
and

  B and A
You want to compare expected execution times, which are:

  E(A) + p(A) E(B)
(E(X) =expected execution time of X, p(X) = probability that X evaluates to true) and

  E(B) + p(B) E(A)
The first is larger if

  (1 - p(B)) E(A) > (1 - p(A)) E(B)
End result is that it may be better to put the expensive call first, if the probability of the cheap call being true is much higher than that of the expensive call.

[Feel free to extend this model by adding the time needed to make or not make the branch around the second part]

In practice, if one is a function call and the other is not, put the function call last, then, if performance is inadequate, measure.


This is also one of those things that an optimizing compiler with profiler (or, equivalently, a JITter) can do. (I don't know if any compilers currently actually do so.)


You probably mean to put the probably false condition first, since if the first condition is false, the others are not tested anymore.


If the first condition is the slowest you can't get faster than that. But if the first is faster it might be the only one executed.


Reduce round trips.

That might be round trips to a database, round trips to a caching layer, or round trips from client to server.

Generally speaking, grab as much data as you'll need (but not more).


Use hash tables, hash maps, dictionaries, JavaScript objects, etc. (whatever it's called in your language of choice) for indexing.

This gives you constant time access to data by whatever properties you want to index on (usually by id).

Also, use libraries and reuse your own code/routines. If you find a faster way to do something, all the code that used those routines will benefit immediately.


Corollary - some times hash tables aren't the fastest data structure. You can often replace a hash with a trie to see a considerable improvement (Caveat emptor ...)


Use the correct data structure !

Don't use an array when you really need a Set. Store your data un hash maps is possible. etc


I/O latency is usually a major "hidden" cost on programs that deal with large sets of data, not only when writing the solution to disk or loading the data, but memory access in particular. Bottlenecks on the memory hierarchy is something to pay a close attention to.


Avoid sqrt (you can always use "x^2 < 2^2", instead of "sqrt(x^2) < 2")


When interacting with a (well-designed and indexed) database, stick as much filtering and sorting as you can in the database query, rather than using a less specific database query and then crunching the result about some more in your own code.


Use the correct algorithms. O(log n) beats O(n) beats O(n log n) beats O(n^2) almost every time. A common mistake is e.g. using the `in` operator with lists in Python, instead of converting the list to a set first.


Converting to a set in Python is extremely beneficial when you will be doing many `in` queries on the collection. But it isn't necessarily good in all cases, because converting the list to a set is itself O(n) and slower than `in`.

Example:

  In [1]: xs = range(10000)

  In [2]: %timeit 5000 in xs 
  100000 loops, best of 3: 77.8 µs per loop

  In [3]: %timeit 5000 in set(xs)
  1000 loops, best of 3: 235 µs per loop

  In [4]: xs_set = set(xs)

  In [5]: %timeit 5000 in xs_set
  10000000 loops, best of 3: 70.7 ns per loop


> A common mistake is e.g. using the `in` operator with lists in Python, instead of converting the list to a set first.

Is this true even for small lists without many duplicates?


Not really

You may be able to compromise and use an ordered list and a binary search.


there's a constant factor hidden in all of those, and it's not free.

a tree can often be slower than a simple array for lookups, even if the tree is O(log n) and the array is O(n).


Especially since the array will probably be linear in memory, so will work well with the cache, and the tree probably will have jumps all over the RAM unless the implementation uses a backing array.


The book Code Complete, by Steve McConnell, has a chapter with many great tuning techniques and general good advice on this topic. http://cc2e.com/


Offload heavy task to run in the background. Use caching. Make cpu intensive processes parallelisable. (That's not a word, is it?)


Depends on platforms. In general use right data structures, optimize hot-spots and basic hardware awareness.


Profile and identify bottleneck?


Nah, applying these cargo cult recommendations is obviously a better idea.


Write in assembly!! That's faster!


Especially the writing bit...


My first advice is always:

Check your loops, it's where your mistakes get amplified.


The slowest processor is the wetware reading the code. Write straightforward code, using common design patterns, with semantic comments (don't describe what its doing; say why its doing it).


#pragma omp parallel


caching and database indexing


Cache.


Write simpler code.


Just use dictionaries and arrays for almost everything.


A book by Jon Bentley [1], "Writing Efficient Programs", is very good for this area. It is out of print, probably, was written many years ago, but IMO is still likely to be very useful for a _wide_ range (not all) of performance issues.

This is because he approaches the problem both somewhat scientifically, as well as gives many rules of optimization that can be applied in various common situations. Lots of code examples in a language similar to (or actually) Pascal (forget which, read it some years ago), but that should not be a problem for any competent programmer. Many real life war stories of performance tuning throughout the book. In some of his examples he manages to improve some initial code (e.g. the travelling salesman problem) by one or more orders of magnitude (1 order being 10X), via repeated application of various performance rules to succeeding versions of the same code, optimized by one rule at a time. One of his war stories is about a team that optimized a quicksort (?) algorithm on a supercomputer by either 100,000 times or 1 million times (for speed) over the initial version, by working at several levels of the stack, including from algorithms through code down to hardware. Great story. At the end of the book he also gives all the rules or techniques again in a summary form (each rule is illustrated with one or more case studies in the body of the book), along with guidelines on when and when not to apply each. You might be able to get a used copy of the book on Amazon even though it is out of print.

Code hoisting, common sub-expression elimination, loop unrolling (with a good binary search example from Knuth), trading space for time and vice versa, using interpreters (to save space - he gives an example of someone who wrote an interpreter for an interpreter in a space-constrained old IBM machine environment, resulting in huge saving of space) - those are some of his techniques / war stories that I remember off the top of my head. Really worth reading, IMO.

He is also the author of Programming Pearls and More Programming Pearls - also pretty good books.

I just looked up his Wikipedia page again:

http://en.wikipedia.org/wiki/Jon_Bentley

and saw these excerpts (among other stuff):

Jon Louis Bentley (born February 20, 1953 in Long Beach, California)[1] is a researcher in the field of computer science. He is credited with the invention of the k-d tree. ... After receiving his Ph.D., he joined the faculty at Carnegie Mellon University as an assistant professor of computer science and mathematics. ... At CMU, his students included Brian Reid, John Ousterhout (inventor of Tcl/Tk), Jeff Eppinger, Joshua Bloch, and James Gosling (inventor of Java), and he was one of Charles Leiserson's advisors. Later, Bentley moved to Bell Laboratories. ... He wrote the Programming Pearls column for the Communications of the ACM magazine, and later collected the articles into two books of the same name. ... Bentley received the Dr. Dobb's Excellence in Programming award in 2004.


This post is related and interesting too:

http://googleresearch.blogspot.in/2006/06/extra-extra-read-a...

In the book Writing Efficient Programs, Bentley talks about teaching a class of experienced programmers (the class was about algorithms), in which he asked all of them to write a binary search algorithm. Then he showed that almost all of them had bugs. The above post is by Joshua Block (of Sun, author of Effective Java, who was one of the students on that class), and the post says that Bentley's own program for the search also had a bug - found later.

Acording to Bloch, Bentley's bug escaped detection for two decades, and further, Bloch's code for the same binary search, written for the JDK (for java.util.Arrays), also had a bug that was detected nine years later.


P.S.: What I call the Bentley-Knuth problem (for lack of a better term) is also interesting. Bentley posed it to Knuth. I read about it in a blog post and wrote solutions for it (unoptimized) in both Python and shell:

http://jugad2.blogspot.in/2012/07/the-bentley-knuth-problem-...

The original blog post that I read is here:

More shell, less egg: http://www.leancrew.com/all-this/2011/12/more-shell-less-egg...

It has many comments which are interesting.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: