It's too bad because it could take one more step and do something slightly coole...

It's too bad because it could take one more step and do something slightly cooler.

I just do counting in MapReduce, but it was to use Expectation-Maximization on mixtures of Gaussians to cluster a data set of 16 million documents. Although computing the Guassian's isn't really trivial, all you do at the end is sum up and normalize.

It's very cool and isn't much more than counting (replace the map step). In fact the paper I took it from (and the whole Mahout on Hadoup thing) is a bunch of machine learning algorithms that are just big summations at the reduce step.