Use the correct algorithms. O(log n) beats O(n) beats O(n log n) beats O(n^2) al...

canjobear · on Feb 1, 2015

Converting to a set in Python is extremely beneficial when you will be doing many `in` queries on the collection. But it isn't necessarily good in all cases, because converting the list to a set is itself O(n) and slower than `in`.

Example:

  In [1]: xs = range(10000)

  In [2]: %timeit 5000 in xs 
  100000 loops, best of 3: 77.8 µs per loop

  In [3]: %timeit 5000 in set(xs)
  1000 loops, best of 3: 235 µs per loop

  In [4]: xs_set = set(xs)

  In [5]: %timeit 5000 in xs_set
  10000000 loops, best of 3: 70.7 ns per loop

Alex3917 · on Feb 1, 2015

> A common mistake is e.g. using the `in` operator with lists in Python, instead of converting the list to a set first.

Is this true even for small lists without many duplicates?

raverbashing · on Feb 1, 2015

Not really

You may be able to compromise and use an ordered list and a binary search.

blibble · on Feb 1, 2015

there's a constant factor hidden in all of those, and it's not free.

a tree can often be slower than a simple array for lookups, even if the tree is O(log n) and the array is O(n).

FreeFull · on Feb 1, 2015

Especially since the array will probably be linear in memory, so will work well with the cache, and the tree probably will have jumps all over the RAM unless the implementation uses a backing array.