> *you don't meaningfully capture people who only write reviews to flag serious ...

> you don't meaningfully capture people who only write reviews to flag serious problems

Yes - reading the comments here I'm realizing a major problem with any reviewer-centered system is that people decide whether to review on hugely varied conditions.

An always-five-stars reviewer might just be easily impressed (or a fraud), but they equally might subscribe to "if you don't have anything nice to say...". And quite a lot of people write exclusively bad reviews, but it's not obvious how to discern grumpy reviewers from people who only speak up about major issues.

A partial fix might be available by analyzing how far a given review is off user's average difference from product average, which could discern a five-star bot from a person who only reviews great products. But even that doesn't solve the XKCD problem where a product has median-case appeal but a high rate of critical failures. In true "what can't meta-analysis fix?" style, this could be improved by looking at a user's average distance outside 1SD of the mean review, or perhaps by special-casing products with multimodal reviews.

Of course, it's deeply unclear how to convert this to an output. Scaling reviews based on reviews sounds like a nightmare, reviews shouldn't be differential equations and no one wants to see 4+ layers of statistics to buy a new lamp. Perhaps all of the indirect work could be done off raw ratings behind the scenes to produce a general "adjusted average" for display?

(More realistically, the serious-problem case only seems solvable by reading text reviews, and just devaluing outright fraud and always-angry cranks would be a massive improvement over existing systems.)