Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I found the reply to Robert Tarjan interesting, and I wonder how many people who conduct programmer job interviews would have guessed that the Big O derived value vs the actual performance of the algorithm would be so unpredictable!

"Another issue, when we come down to earth, is the efficiency of algorithms on real computers. As part of the Stanford GraphBase project I implemented four algorithms to compute minimum spanning trees of graphs, one of which was the very pretty method that you developed with Cheriton and Karp. Although I was expecting your method to be the winner, because it examines much of the data only half as often as the others, it actually came out two to three times worse than Kruskal's venerable method. Part of the reason was poor cache interaction, but the main cause was a large constant factor hidden by O notation."

Question: in what ways can having a large constant factor in an algorithm cause such a dramatic performance difference on modern hardware. Can anyone give examples? I can only assume that the "large" constant factor in this example was large enough to be always overflowing, and needed some bignum functions to work.



I think the situation may have been like this:

Algorithm A has complexity like AO(n) and algorithm B goes like BO(n^2) (just made-up examples). So one would say that A is better. But the constant A happens to be very large, so that, until n becomes very large, B is actually faster, because the constant B is so much smaller. So conventional analysis would say that A is better in general, but its actual poor performance is "hiding" in the huge constant A, and is worse than B in real situations, where n never gets huge enough for A's complexity advantage to become relevant.


Jon Bentley (I think) described one situation once, where the problem was swapping two substrings in a string (again, I think). One of the algorithms was the double reversal trick[1] and the other wasn't; both were O(n). He implemented both and ran them on a large string on his (small) machine. One program finished in seconds while the other he terminated after several minutes.

The difference was that the string was large enough to force the machine to swap memory out to disk and the slow version was exactly pessimal regarding the machine's paging algorithm. It would touch a location on a page, forcing it to be brought back into memory, and then wouldn't touch the page again until it had touched all of the other pages of the string. The fast program did a lot of work on one page before going to the next and finished relatively quickly, the slow program thrashed. The same thing can be seen with modern machines' memory caching.

However, this kind of complexity analysis is intended to abstract over machine specific details like cache line sizes and paging algorithms, to let algorithms be compared in the abstract. Usually, if you don't trip over weird edge cases and if the problem is large enough (an assumption that is part of the analysis), an algorithm with a better O notation will kind, sorta, maybe always be faster than one with a worse.

(I think you have a misapprehension: the "large constant factor" is part of the simplification process from the hairy formula that actually describes an algorithms' performance (see Knuth's discussion of a "recurrence relation") to the abstract summary of Big-O: O(f) = n, O(g) = n^2,.... It is not any factor in the program itself. For example, O(f) = n is true if f(x) = x and if f(x) = 2x. But the second one will be slower.)

[1] https://tecnocode.co.uk/2008/11/19/string-swap-algorithms/


Radix sort, its runtime is O(d * n) where d is the number of digits and n is the number of numbers being sorted. So for practical purposes d could become a constant in a given implementation. Compare that to other efficient algorithms (like merge sort) that are O(n log(n)). The constant could, below a certain input size, be larger than log(n).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: