Big O Notation: Using not-boring math to measure code's efficiency

TheNewAndy · on Sept 20, 2014

I don't think the attitude of being allowed to be hand-wavey is a good one for this sort of thing. Big O notation has very precise semantics, and it is very easily misused.

Many of the examples are wrong. A python "print" statement is (for the built-in types at least) O(n), so printing the first item from a list must be at least O(n).

Fundamentally, looking at big O notation as a way of measuring runtime is misguided. It is a way of describing a class of functions. But there is typically a level of indirection between the function that is being talked about, and the one which the analysis is applied to. For the "return the first item in the list" example, the function which is being analysed is a function from Int to Real. That function iterates through all values with size corresponding to the input to function, for each of these it runs the "return the first item in the list" function and measures the time it takes. The maximum time taken is then the Real output.

That intermediate function is important, and needs to be specified properly. If it isn't, then you get confusion The section "N could be the actual input, or the size of the input" is an example of this confusion. Here, the intermediate function which is being analysed is much simpler (but it is still present).

I should probably also add, that I am not by any means an expert here, and I wouldn't be surprised if I've made sloppy mistakes here. I'll just make the pre-claim that this only further proves my point that being hand-wavey is not a good thing :-)

scott_s · on Sept 20, 2014

When doing complexity analysis, one should state what the base operation is. For example, when looking at sorting algorithms, we typically use the operation "is x less than y?", or something similar to it. What the analysis then tells us is how many comparisons we make.

The author does not state what his operations are. So, with regards to the examples, he implies that the operation he's counting is "print". So his examples are correct.

You are right, though, that we can get in trouble if it turns out that our base operation - which we assume is constant time - is actually not constant time. When looking at algorithms that deal with very large numbers, for example, we can't consider multiplication to be constant time.

So, with regards to the complexity of print, you are correct that it is linear - O(n) - in the item being printed. But I believe you are confusing the quantity "size of the list" and the quantity "size of an item in the list". If we have a list of size n, and that list contains items whose largest size is m, then we would say the cost of that first loop is O(n * m). If we can just assume that the size of an item is not large enough to matter - that is, m is actually a constant - then we can safely say that the cost is O(n).

gameguy43 · on Sept 20, 2014

Agreed. I'm trying to strike a balance here between painting the full picture and being approachable for people who are scared of math. These notes are super helpful though--we can probably nod briefly to this complexity without getting /too/ stuck in the weeds. Really appreciate the thoughts!

ggchappell · on Sept 20, 2014

> Many of the examples are wrong. A python "print" statement is (for the built-in types at least) O(n), so printing the first item from a list must be at least O(n).

Yes & no. Big-O is about counting something and seeing how that amount changes when our independent variable (input size, maybe) gets large. We can count whatever we want.

It is correct to say that

    print(x)

executes O(1) print statements. OTOH, it does not execute a constant number of single-character operations.

> I don't think the attitude of being allowed to be hand-wavey is a good one for this sort of thing. Big O notation has very precise semantics, and it is very easily misused.

That's a good point. Perhaps the best way to present big-O for "the masses" is (1) give people enough guidance to use it correctly, (2) make sure they know that this is not hand-wavey math, but rather one with precise definitions, but (3) not worry so much whether they know what those definitions are, or how to use them directly.

kazinator · on Sept 19, 2014

An O claim can always be pessimistic. A linear time algorithm in N is not only O(N) but also O(N*N) an O(2^N): it is at least as good as all of these. If we are sure that the algorithm is faster or slower than N, then can say Theta(N), the tight bound.

http://stackoverflow.com/questions/471199/what-is-the-differ...

(Often, we are sure but we still use O.)

dllthomas · on Sept 21, 2014

A start-up I applied for once asked me to take a test, and one of the problems was "Implement a sorting algorithm that is not O(n^2)." I thought about snarkily adding extra walks through the data, but decided instead to explain the problem with the question and answer the question they were trying to ask.

nkurz · on Sept 20, 2014

While these are probably the answers that an interviewer wants, I cringe every time I see someone say that Big O tells you "how long an algorithm takes to run". It's clear the author understands the limitations, but I worry that encouraging people to equate algorithmic complexity with run time in the real world does more harm than good. Big O is fine for theoretical analysis, and at one point it was a reasonable approximation for performance, but with current superscalar SIMD processors with several layers of cache, the gap between theory and practice is getting pretty wide.

Big O tells you (to within some smaller order terms) how many more operations an algorithm needs to perform if you double the size of the input from really big to really, really big. This corresponds to increase in run time at small scale only if all else remains equal, which usually it doesn't. And even then, it only tells you the increase for that same algorithm, rather than how to compare the actual run time to other algorithms with different constants.

My hope is that someday Big O will be reserved for theory and programmers will preach something more like the roofline model: http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-13...

ggchappell · on Sept 20, 2014

This is very nicely done.

I do think it has one major omission -- something that is unfortunately omitted all too often. This is that, just as we get to decide what "n" means, we also get to decide what operations we are counting. Furthermore, our choices matter.

For example, if we are dealing with data accessed via a slow connection, or just about any kind of distributed system, operations that use the slow connection will be much slower than those that do not. So often we will count only those. O(n^2) doesn't give the full picture, if we only use O(n) slow operations.

For another example, if we are doing computations using large integers, then it usually does not make sense to count arithmetic operations as single operations. In such circumstances, a large-integer multiply can take a lot more time than a pointer dereference.

shloub · on Sept 19, 2014

Says it studies the complexity wrt the size of the input but doesn't say the size of n is O(log n) when it comes to the example :(

gameguy43 · on Sept 20, 2014

Hm, sounds like a bug. Which example are you referring to?

jackpirate · on Sept 20, 2014

The size of an integer n is O(log n).

gameguy43 · on Sept 20, 2014

Ah, gotcha. I'll patch that up. Thanks!

shloub · on Sept 20, 2014

I was referring to the print s*n example (O(n) being exponential wrt the size of the input). You're welcome ;)

cortesoft · on Sept 19, 2014

Not-boring math? Isn't all math not boring?

marcosdumay · on Sept 20, 2014

That depends entirely on how deep you look.

If you only look at the surface, and never care to think about what it's good for, mat can be quite boring.

joshvm · on Sept 20, 2014

I think a log n example would be useful, e.g. a basic divide and conquer. Given that many algorithms strive for logarithmic complexity, it seems a shame to miss this out.

Also how about a list of complexities in order of growth time?

gameguy43 · on Sept 20, 2014

Agreed. I'm working on an "advanced" follow-up that digs into log n and a couple other things!

Bahamut · on Sept 20, 2014

"Big O notation is like math except it's an awesome, not-boring kind of math"

Except big O notation is math and the use of it for complexity analysis is one application of it - this is a completely off-base statement.

mdxn · on Sept 20, 2014

I'll ignore discussion as to what counts as a step, what "n" is, or how it doesn't necessarily reflect on the real world. I still feel like this article focuses on the wrong thing. It's equivocating Landau notation (Big O) with complexity/efficiency, but never explains what complexity is. It then proceeds to link this to the efficiency of code under certain circumstances. While this work gives a couple of examples as to how to determine the "big O" of simple algorithms, it ignores its meaning and context. This is a huge mistake.

First off, big O notation wasn't created with complexity theory or algorithms in mind. If I recall correctly, it has its roots in complex analysis. The notation was later borrowed as computational complexity was formalized. What happened is that people needed a way to describe the hardness or "logical sophistication" of problems and tasks. Looking into the asymptotic behavior of a complexity measure was just one means that they used. There are plenty of other ways to do this. The article reads like big O exists for the purpose of complexity analysis of algorithms. I doubt the author intended this, but they should be a bit more careful.

Time and space aren't the only resources of interest. Sometimes you need to quantify how much information you have or how much entropy there is. If you want to understand "what is basically happening", this is something you should pay attention to. Are we handling more information than we have to? What do we need and not need? Circuit sizes, sophistication of proofs, setups of databases, and the complexity of communications/protocols also play a role.

Even ignoring all this, the article is missing an elementary half of the picture: the structure of the input. It's not just about code and its design. Data structures matter, too. How information is organized, how it is encoded, and how it can be manipulated (computed on) makes an absolute huge difference. Are things sorted or unsorted? What kind of list or tree structure is your code working with? From what I understand, this is the whole point of a significant chunk of the interview questions candidates are presented with. It's ridiculous to omit this when introducing algorithmic complexity.

The author claims that Big O notation is an "awesome, not-boring kind of math where you get to wave your hands through the details and just focus on what's basically happening". They are hand waving not just the math, but also hand waving (and ignoring) most of the context as well. It doesn't even directly explain why big O notation models code efficiency in a fruitful way. It just mentions run time and memory usage superficially. In my opinion, this needs to be significantly revised or rewritten.

scott_s · on Sept 20, 2014

The article reads like big O exists for the purpose of complexity analysis of algorithms. I doubt the author intended this, but they should be a bit more careful.

I see no problem with this. My first taste of algorithm analysis was in a freshmen year course, where we were taught the phrase "big oh" and how to analysis algorithms in quite a similar fashion as this article. The idea was to give us the briefest of introductions of the topic, just enough so we could talk intelligently about the elementary data structures we were being introduced to (vectors, lists, trees). We didn't get the real deal until senior year.

"Education is a series of small lies", is what my intro to computer engineering professor said when he said that, actually, some circuits are ternary, not binary.

Basically, the author is trying to introduce a concept to the reader. I don't fault him for not heaping the entirety of an area into an introduction.

mdxn · on Sept 20, 2014

I get the gist of what you are saying and agree with some of it. I still think the article isn't sufficient, at all, for even an elementary introduction to anything. It's incredibly incomplete. An introductory cs course is going to present asymptotic complexity analysis (written using Landau notation) as one of the TOOLS for thinking about the efficiency of code. It will hopefully not just hop to big O without setting up the context. The author can fix this by just reworking the preface. It needs to answer the question: "why are we doing this and how will this accomplish anything?"

If you fail to frame the context and the semantics well enough, you'll end up with people running around saying that an algorithm is "big O of blah." AGH

The introduction to the concept is articulated by the following comparison:

'We want to measure code efficiency. Big O notation describes this. Here are some examples.'

vs.

'We want to measure code efficiency. We can do this by thinking about its usage of resources (complexity). for example, total run time and memory usage. One of the ways to compare the efficiencies of algorithms is to compare their asymptotic behaviors. We can express this using big O notation. Here are some examples.'

-----

The author talks about how big O notation is awesome because you can drop the constants. It is not explained why this is awesome or useful. Interestingly, it later says that sometimes it isn't good because the constants sometimes matter. What is the reader supposed to understand here?

There's no followup as to what to think about after you determine the "big O" of an algorithm. Either he forgot to write this or figured that it was obvious at that point. Considering that he didn't sufficiently explain the concept's motivation either, I don't see how an uneducated reader will get a clear picture.

triplenineteen · on Sept 20, 2014

> But to be more specific we could say this is worst case O(n) and best case O(1) runtime.

Those should really say θ(n) and θ(1).

marcosdumay · on Sept 20, 2014

That first one should be a big O.

triplenineteen · on Sept 20, 2014

Hmm. Big O already encompasses "at worst".

Theta is the notation for a single case, be it the worst case, the best case, one run of a randomized algo, etc.

At least that is my understanding.

ephemeralgomi · on Sept 20, 2014

Big O is an upper bound, little o is a lower bound. Big theta means both an upper and lower bound.