In most cases I'd probably use nearly this. I note that it contains a bug due to...

cluckindan · 2025-01-24T16:53:17 1737737597

I use that math textbook algorithm in production to produce a median from a list which has a bounded size and is already sorted by the db, though that bound could technically grow to INT_MAX if someone managed to make that many requests in five minutes. Not very likely. :-)

gpm · 2025-01-24T17:29:09 1737739749

> and is already sorted by the db,

Right, if it's already sorted just taking the midpoint is the obviously correct algorithm (and O(1) time/space). It's only in the unsorted cases where with giant lists you should start thinking about alternatives.

If I'm working with gigabytes of photon counts (each element representing the number photons detected in a time interval) I don't want to sort my gigabyte long list before getting the median - sorting would destroy the very important structure of the data so I'd just have to throw away the copy afterwards. This is referencing some code I worked on a long time ago. I'm not sure I had to calculate a median specifically, but similar enough statistics. It's a simple function, but not a one size fits all algorithm.