Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Use the correct algorithms. O(log n) beats O(n) beats O(n log n) beats O(n^2) almost every time. A common mistake is e.g. using the `in` operator with lists in Python, instead of converting the list to a set first.


Converting to a set in Python is extremely beneficial when you will be doing many `in` queries on the collection. But it isn't necessarily good in all cases, because converting the list to a set is itself O(n) and slower than `in`.

Example:

  In [1]: xs = range(10000)

  In [2]: %timeit 5000 in xs 
  100000 loops, best of 3: 77.8 µs per loop

  In [3]: %timeit 5000 in set(xs)
  1000 loops, best of 3: 235 µs per loop

  In [4]: xs_set = set(xs)

  In [5]: %timeit 5000 in xs_set
  10000000 loops, best of 3: 70.7 ns per loop


> A common mistake is e.g. using the `in` operator with lists in Python, instead of converting the list to a set first.

Is this true even for small lists without many duplicates?


Not really

You may be able to compromise and use an ordered list and a binary search.


there's a constant factor hidden in all of those, and it's not free.

a tree can often be slower than a simple array for lookups, even if the tree is O(log n) and the array is O(n).


Especially since the array will probably be linear in memory, so will work well with the cache, and the tree probably will have jumps all over the RAM unless the implementation uses a backing array.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: