Hacker News new | past | comments | ask | show | jobs | submit login

In most cases I'd probably use nearly this. I note that it contains a bug due to integer overflow if naively translated to most languages.

But if I have a big enough list that I care about space usage (I don't want to make a copy that I sort and then throw away), or speed (I care about O(n) vs O(n log(n))) I'd be looking for a library before implementing my own.

Here are the relevant algorithms if you really want to implement your own fast median code though: https://cs.stackexchange.com/questions/1914/find-median-of-u...




I use that math textbook algorithm in production to produce a median from a list which has a bounded size and is already sorted by the db, though that bound could technically grow to INT_MAX if someone managed to make that many requests in five minutes. Not very likely. :-)


> and is already sorted by the db,

Right, if it's already sorted just taking the midpoint is the obviously correct algorithm (and O(1) time/space). It's only in the unsorted cases where with giant lists you should start thinking about alternatives.

If I'm working with gigabytes of photon counts (each element representing the number photons detected in a time interval) I don't want to sort my gigabyte long list before getting the median - sorting would destroy the very important structure of the data so I'd just have to throw away the copy afterwards. This is referencing some code I worked on a long time ago. I'm not sure I had to calculate a median specifically, but similar enough statistics. It's a simple function, but not a one size fits all algorithm.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: