Goodhart's Law

rpdillon · on Sept 17, 2021

A closely-related effect I often cite is the McNamara fallacy[0], which is essentially about the tendency to focus on aspects of a system that are easily measurable, often at the expense of aspects that are not. I see it as one of the weaknesses of the data-driven decision-making movement, since many interpret "data" to mean "numbers". I think this fallacy can partly explain why Goodhart's Law holds: it's the non-measurable (or difficult-to-measure) aspects that suffer most when a metric becomes a target, since measurable aspects could be (and often are) integrated into the target metric.

[0]: https://en.wikipedia.org/wiki/McNamara_fallacy

burnafter182 · on Sept 17, 2021

As it's told by Mandelbrot, the January effect was identified. Rather quickly it was exploited to the point where it no longer existed because the market exerted pressures to counteract it in the process of exploiting it.

"Consider three cases. First, suppose a clever chart-reader thinks he has spotted a pattern in the old price records—say, every January, stock prices tend to rise. Can he get rich on that information, by buying in December and selling in January? Answer: No. If the market is big and efficient then others will spot the trend, too, or at least spot his trading on it. Soon, as more traders anticipate the January rally, more people are buying in December—and then, to beat the trend for a December rally, in November. Eventually, the whole phenomenon is spread out over so many months that it ceases to be noticeable. The trend has vanished, killed by its very discovery. In fact, in 1976 some economists spotted just such a pattern of regular January rallies in the stocks of small companies. Many investors close their losing positions towards the end of the year so they can book the loss as a tax deduction—and the market rebounds when they reinvest early in the new tax year. The effect is most pronounced on small stocks, which are more sensitive to small money movements. Alas, before you rush out to trade on this trend, you should know that its discovery seems to have killed it. After all the academic hoopla over it, it no longer shows up as clearly in price charts."

-Benoit Mandelbrot, The Misbehavior of Markets

https://en.wikipedia.org/wiki/January_effect

Stratoscope · on Sept 17, 2021

This reminds me so much of a Christmas present hack my sister and I invented when we were kids. We used to open all our presents on Christmas morning.

Then we begged "Can't we open a couple of presents on Christmas Eve?" So we got to open a few that night.

Next year was "Well, how about Christmas Eve Morning? Maybe just one or two?"

And the next year was "The 23rd is practically Christmas Eve, isn't it? It's just a few hours apart. Can't we open all our presents on the evening of the 23rd?" And we did!

We didn't push it past that: we were already so happy that we got our presents a day and a half before all our friends!

miki123211 · on Sept 17, 2021

As far as I'm aware, this is why markets are considered a second-order chaotic system. In those systems, measurements of how the system performs can actually influence what happens next. This is in contrast to first order systems, i.e. the weather, which are hard to simulate, but the results of the simulations don't affect their accuracy.

kbelder · on Sept 17, 2021

"First get rich; then, publish".

If that's not a law, it should at least be a rule-of-thumb.

rossdavidh · on Sept 17, 2021

In the context of manufacturing, W.E.Deming said something similar: "that which gets measured, gets improved". His conclusion from this was a little different than McNamara's, though. Since you will inevitably want to track your progress, make sure you track as many things as possible, because anything which is not tracked will get sacrificed to that which is. Up to a point, it's true.

One issue is that some things, like vulnerability to supply chain disruptions, are intrinsically harder to track because they are based on rare occurrences. Thus, they will tend to get sacrificed in favor of measures which are more frequent, leading to an emphasis on short-term strategies.

CalChris · on Sept 17, 2021

Actually, Deming said much the opposite:

  It is wrong to suppose that if you can’t measure it, you can’t manage it – a costly myth.

p. 26 of The New Economics for Industry, Government, Education

It wasn't Drucker either.

https://medium.com/centre-for-public-impact/what-gets-measur...

This whole mindless must measure, must measure mentality has been criticized since the 50s. Measurement is a tool. There are many tools.

mumblemumble · on Sept 17, 2021

I would argue that it's worthwhile to measure as much as you can, insofar as it facilitates orderly decisionmaking processes.

The problem is that people tend to think that all measurement is necessarily quantitative. I think that this might be a version of the streetlight effect? Quantitative measurements tend to be much easier to collect and analyze than qualitative measurements. Oftentimes you can let it all run on autopilot, whereas doing good qualitative work always requires concentration, effort, and expertise.

dragonwriter · on Sept 18, 2021

> I would argue that it's worthwhile to measure as much as you can, insofar as it facilitates orderly decisionmaking processes.

That would be true, if measurement was free. It never is, and it often is quite costly.

dr_dshiv · on Sept 19, 2021

What is qualitative measurement? As soon as it is quantitative, it isn't qualitative anymore, is it?

mumblemumble · on Sept 19, 2021

Here's the first Google hit: https://study.com/academy/lesson/the-difference-between-qual...

rossdavidh · on Sept 17, 2021

In my experience, if it isn't measured, then it is assumed that the policy (whatever it is) is working. If you don't measure, you can't be surprised by "wow, it didn't work like we thought". Therefore, mistakes don't get recognized or corrected. There are many tools, but measurement is one of the only ones that brings unexpected bad news to the user, and that is invaluable.

mhink · on Sept 17, 2021

> One issue is that some things, like vulnerability to supply chain disruptions, are intrinsically harder to track because they are based on rare occurrences. Thus, they will tend to get sacrificed in favor of measures which are more frequent, leading to an emphasis on short-term strategies.

I suppose this is part of why "chaos engineering" has gained popularity- introducing artificial disruptions at a known rate makes it easier to quantify the impact of otherwise-unusual events.

rossdavidh · on Sept 17, 2021

Ooh, good point! Another example is auditing, where you substitute regular, frequent disruptions (being audited is disruptive to normal operations) for infrequent, less predictable, bigger disruptions.

dragonwriter · on Sept 18, 2021

> from this was a little different than McNamara's, though. Since you will inevitably want to track your progress, make sure you track as many things as possible, because anything which is not tracked will get sacrificed to that which is.

That wasn't Deming’s position, Deming was pretty strong on the idea that there are measurable things that it is not cost effective to measure.

A big contribution of Deming’s here is the importance of understanding uncertainty in measurements relations between them to understand the degree of control.

equality_1138 · on Sept 18, 2021

Fredrick Taylor was the one who developed so-called management science and obsessive measurement, paving the way for Deming to popularize a more balanced or holistic approach, ie. akin to actual science.

ghaff · on Sept 17, 2021

It's a genuinely hard problem because the quantifiable output metrics are easy to measure and an individual often does have some level of direct control over them. So we convince ourselves that they're a reasonable proxy for something we care about but aren't sure how to measure and that we have control systems in place, whether management or individual responsibility, that largely prevent e.g. quality being thrown away in pursuit of quantity.

And we're often not entirely wrong if we do pick reasonable proxies and have reasonable control systems in place. Because throwing up our hands and saying metrics are useless is usually not the answer either.

TeMPOraL · on Sept 17, 2021

A flavor of this problem is what could be called "diffusion of responsibility", for a lack of better term. An individual who defined the measure and then optimizes for it will quickly figure out when their measure stops being a good proxy. But in organizations there usually isn't a single person who both understands what the measures are proxying, and has the power to remove a metric once it is used up, or get people to stop overfitting it.

Jtsummers · on Sept 17, 2021

Another thing I've observed is when you have two measures that operate at different time scales. Both may even be valid measures, but the one measured (and responded to) more often has a stronger impact, and can negatively impact the less frequently measured metric when there's a conflict between them.

A particular instance for this has been (to keep it simple) quantity (speed) and quality in production environments (factories and the like). Daily throughput measures paired with less frequent quality measures. The desire is to keep throughput high, and quality ends up suffering as a result. By integrating quality measures into the process you make the two measures compete on more equal footing, forcing a balance. At least one factory I worked in (well, adjacent to, I was in the software portion not the assembly line) massively reduced their quality problems by integrating quality checks between each station. This contrasted with the prior years where throughput, being measured and reacted to daily, drove them to make things so fast that they had piles of rework at the end. Integrating the quality measures between stations slowed them down, but their rework numbers turned into a rounding error (over a decade ago so I've forgotten the exact numbers, but they went from having items needing rework nearly every day to maybe one or two a month). As a result their real (deliverable to customers) production increased and their cost per unit dropped.

AceyMan · on Sept 17, 2021

I see this as the root cause of the recently announced class action suit against LADWP over their implementation of tiered electricity pricing.

The tiers (kwh rates) are in hunks of the a day measured in hours.

But the reporting is only available to the consumer in the form of a monthly bill, so by the time you discover you were eating pixies in the Peak Cost hours the heat wave is over and your bill is already through the roof.

(Any local SoCal residents please feel free to pick my analysis apart, but that was my first take when I heard about the legal action.)

serverholic · on Sept 17, 2021

The better option is to put someone in charge with a vision. However that is risky.

cryptica · on Sept 17, 2021

To make matters worse, it's also a vicious cycle. For example, if everyone is focused only on specific kinds of data and ignores observable reality, the trend in the data will become self-fulfilling until a point when the perceived inconsistency between the data and observable reality has grown so large that it becomes impossible to ignore.

To understand what the big problems are today, you just have to think about the kinds of data which people in government (and the public) haven't been thinking about or aiming for. For example: Happiness, honesty, altruism, sanity... These are not measured and not targeted so they got completely crushed.

In the past, large, powerful religious groups would target these characteristics but nowadays, society is more secular so these aspects of our lives have suffered significantly.

serverholic · on Sept 17, 2021

That's one of the thing Andrew Yang talks about. GDP isn't the only thing that matters, nor is it the best metric.

paulddraper · on Sept 18, 2021

I recall a presentation by a feature team: "And making the share button bright blue yield a 15% increase in share rate."

Okay, yeah I'm sure it does. Because it looks terrible and clashes with literally everything on the screen.

serverholic · on Sept 17, 2021

Another problem is that overly focusing on metrics means that people with good instincts are brought down to the same level as those with poor instincts.

As long as metrics are increasing someone can write shit code and design a bloated product.

BuyMyBitcoins · on Sept 18, 2021

“Instinct” can’t be quantified and it’s such a shame that people with good track records can’t quite justify their choices quantitatively. This happens where I work constantly. The managers who get most of their projects delivered on time - I suspect because they have good intuition about effort levels - are constantly struggling to justify their higher initial estimates because the data driven people try to reduce development down to a formula.

miki123211 · on Sept 17, 2021

I believe this effect greatly contributes to why government IT systems are so bad.

When you're writing a procurement contract, it's relatively easy to describe what the requested system must do, but almost impossible to enforce a great UI design, as great UI design isn't objectively measurable.

As a contractor, you're optimizing for minimum money spent, so if good design is not required, good design gets sacrificed first.

One solution to this specific problem would be to conduct user surveys on how pleasant the system is to use, requiring a specific score before the contract is deemed completed.

This trend manifests more generally in bigger organizations. Smaller orgs let people judge things subjectively, so all possible aspects are taken into account, making those things relatively good; this is why startups succeed. In a bigger org, there are often objective judgement measures to prevent the influence of personal biases, politics or even bribes. However, those measures poorly reflect how good the thing in question actually is. This is why a big corp might produce worse software, even when competing against a small and underfunded startup.

As an example, Apple exempted the first iPhone crew from most internal company procedures, creating a quasi-startup inside Apple. Steve Jobs always had the final say, and his opinions were based on what he thought personally, not on how many points in a requirements specification were satisfied. I believe this was one of the reasons for the iPhone's success.

cratermoon · on Sept 17, 2021

> government IT systems

This is not limited to governments. Although it's a common naive bias to assert that governments are worse and less efficient than private industry, what is really happening is that government budgets and projects are open to the public, done in the open. For every failed Healthcare.gov, there are dozens of private industry failures that don't make the news because the operations are not subject to the public disclosure rules.

snidane · on Sept 18, 2021

I've been looking for the name of this for years. It's everywhere. People focusing only on things they can measure instead of what is important.

cratermoon · on Sept 17, 2021

Also https://en.wikipedia.org/wiki/Campbell%27s_law

baron_harkonnen · on Sept 17, 2021

I once mentioned Goodhart's Law to a data scientist at a company and they immediately rejected it based on the unironic assertion that

"that would mean that KPIs shouldn't be the sole measure of our performance that that doesn't make sense!"

My experience in the field has been that an astounding number of products have been destroyed and users harmed by failing to heed Goodhart's Law.

dhosek · on Sept 17, 2021

Are you sure it was unironic? Because I sure can't imagine anyone saying that unironically.

marcosdumay · on Sept 17, 2021

I've heard plenty of "what else do I have to work with?", that has about the same meaning.

rhizome · on Sept 17, 2021

The politician's syllogism comes to mind: "Something must be done, this is something, so we must do this."

https://en.wikipedia.org/wiki/Politician%27s_syllogism

gipp · on Sept 17, 2021

That at least acknowledges that it's an unsatisfactory situation, OP's conversation didn't even have that level of awareness.

dboreham · on Sept 17, 2021

Presumably someone, somewhere, thinks KPIs are a good idea.

Edit: but that person is unlikely to be subject to KPIs themselves.

AnimalMuppet · on Sept 17, 2021

Re your edit: Not necessarily. They may be a winner under the KPI regime, and may feel that they are less likely to be so under a saner regime.

For example, if I'm a manager that can make my people hit their KPIs, and my KPIs are about getting my people to hit theirs, then I'm subject to KPIs, and I like it. It's easier than making my people succeed at what really matters, and it makes me look good.

hpoe · on Sept 17, 2021

To push against this point I prefer KPIs, or something objective that I can be measured against, now that doesn't mean I like bad KPIs but the fact of the matter is there are always going to be KPIs the only question is how explicit or implicit they are.

When KPIs are explicit everyone knows what they are and can modify their behavior to optimize for their KPIs when all measurement goes away the new KPI is the arbitrary one held in the decision makers head, and now instead of it being an explicit bar that can be objectively used to make decisions the entire system falls apart into politics and emphasizing appearances over work because the only thing that matters with implicit KPIs are what everyone else thinks of you, which is much easier to manipulate than the amount of cash you brought in.

bonniemuffin · on Sept 18, 2021

I regularly find myself saying "the only thing worse than metrics is not having metrics."

dang · on Sept 17, 2021

Is Goodhart's Law a sort of upper bound on the usefulness of data for decision-making in general?

baron_harkonnen · on Sept 17, 2021

Rather than implying a limit to the usefulness of data, I find it speaks more to the folly of substituting data for critical thinking.

You can reduce a fever by treating the underlying infection, soaking in ice water, or taking acetaminophen. No good doctor would judge a patient's health solely because a single metric, namely body temperature, was able to be moved into acceptable ranges. That doesn't mean temperature isn't extremely valuable data, and essential to decision making, but that it cannot be a substitute for understanding and solving the real problem.

I once knew of a SaaS company that had perpetually growing MRR (Monthly Recurring Revenue), great right? Except, churn was also growing. An increase in MRR was achieved by upselling a perpetually shrinking group of core customers. The core KPI of this company was MRR, and, unsurprisingly, this company does not exist anymore. Again, here is a case where we can see all that other data (churn, upselling) is very useful, as is the KPI. But the key to success or failure here is whether or not you want to really expend the effort to understand the problem or just chase a KPI.

KPIs are seductive because they make managing team's performance seem much easier: just get this number higher and you're doing good, get it lower and you're doing bad. But that's like playing a game of chess where each piece is controlled by a different person, and that person is judged solely on how many times they can get the king in check.

cratermoon · on Sept 17, 2021

> No good doctor would judge a patient's health solely because a single metric, namely body temperature

That's a good example because as Strathern's formulation notes, the problem lies in make the metric the target. It would be folly to think that reducing a patient's body temperature to the normal range is sufficient for curing illness. GE's Jack Welch famously focused solely on the stock performance as a measure of success. It worked, by that measure GE was wildly successful. By almost any other measure Welch destroyed GE https://www.bnnbloomberg.ca/jack-welch-inflicted-great-damag...

some_furry · on Sept 17, 2021

I wonder if measuring managers' understanding of Goodhart's Law would result in better management.

/s

iaw · on Sept 17, 2021

That's horrifying, my litmus test for data scientists is whether they acknowledge KPI's are imperfect measures. In my experience, data scientists that come from hard science backgrounds (e.g. Physics and Biostats) tend to be much more open minded to the idea that there isn't a "perfect" statistical measure.

disgruntledphd2 · on Sept 18, 2021

Relative to what background exactly?

Like I'd agree that maths/cs people are little naive about error but I can't imagine any data scientist from a social science background thinking that any measure is perfect.

iaw · on Sept 20, 2021

Apologies, I rarely see data scientists from social science backgrounds.

disgruntledphd2 · on Sept 20, 2021

Really?

Wow, so when I worked at a FAANG, I would say that soc sci people (broadly defined) made up at least a third of the data science org. Data science is a broad church though, and varies a bunch across companies so it's normal I supppose (though strange to me).

dang · on Sept 17, 2021

Past related threads. In this case a few of the 1-or-2 comment threads have particularly good posts:

Goodhart's Law - https://news.ycombinator.com/item?id=26839177 - April 2021 (2 comments)

Goodhart’s Law Rules the Modern World. Here Are Nine Examples - https://news.ycombinator.com/item?id=26604130 - March 2021 (3 comments)

Goodhart's Law and how systems are shaped by the metrics you chase - https://news.ycombinator.com/item?id=23762526 - July 2020 (58 comments)

When Goodharting Is Optimal - https://news.ycombinator.com/item?id=22054359 - Jan 2020 (3 comments)

Goodhart’s Law: Are Academic Metrics Being Gamed? - https://news.ycombinator.com/item?id=21065507 - Sept 2019 (27 comments)

Goodhart’s Law: Are Academic Metrics Being Gamed? - https://news.ycombinator.com/item?id=20076485 - June 2019 (2 comments)

When targets and metrics are bad for business - https://news.ycombinator.com/item?id=19135694 - Feb 2019 (6 comments)

Goodhart's Law: When a measure becomes a target, it ceases to be a good measure - https://news.ycombinator.com/item?id=17320640 - June 2018 (134 comments)

Goodhart's Law - https://news.ycombinator.com/item?id=10075780 - Aug 2015 (1 comment)

Goodhart's law - https://news.ycombinator.com/item?id=1368745 - May 2010 (1 comment)

gftsantana · on Sept 17, 2021

My first job was at the anti-fraud department at a telecom company in the early 00s. Our job was basically to determine whether a new landline or mobile contract was fraudulent or not. Some requests would be flagged by an automated piece of software that was basically a black box to most employees, myself included. We would basically look at the documents sent by the clients and, sometimes, ask a few questions via phone.

I was very young at the time, but I remember basically deriving Goodhart's law after a few months in the job. I don't remember clearly most of the things that led me to that conclusion, but I do remember the most extreme: at some point, management started requiring us to block clearly non-fraudulent phones because the directors decided to increase the blocked installations target. It would include even old contracts by good paying customers that happened to be flagged.

I remember trying to talk to people about this, but the idea that trying to reach a target by any means necessary is usually not a good idea was incomprehensible to most people. Years later, I realized that others knew exactly what was going on; they just didn't care, and I was naive for not seeing that.

It was a few years later when I learned about perverse incentives, Goodhart's law, the cobra effect, etc., and it allowed me to have more productive conversations with people about targets and incentives.

pikwip · on Sept 17, 2021

Here's a interesting paper I found that attempts to categorize the mechanisms by which Goodhart's Law operates in the real world. The variants are separated into Causal and Non-causal mechanisms.

https://arxiv.org/abs/1803.04585

derbOac · on Sept 17, 2021

Thanks for posting that. I was going to say -- it's interesting to think about the reasons why Goodhart's Law might hold if it does.

I've always assumed the problem is that the metric is always influenced by other, nontarget variables that become more causally important when the metric becomes a proxy target. So, for example, "gaming the metric" becomes important (in a percent variance sense) after the metric becomes a target. I think the paper's adversarial scenario is closest to this maybe.

They discuss some other factors that seem more relevant to individual cases at any moment in time than an explanation for why a metric's utility might decline over time. In that sense the paper seems to be more about Goodheart-like phenomena in general.

It would be interesting to demonstrate Goodheart's law conclusively with real data in some domains.

davidmanheim · on Sept 18, 2021

Thanks for highlighting our paper!

paulpauper · on Sept 17, 2021

This is why efforts at raising school test scores have not improved actual achievement

dnautics · on Sept 17, 2021

This is why efforts at alleviating poverty have not improved actual poverty...

umvi · on Sept 17, 2021

Poverty is a special case though because it's relative. You could be a millionaire with a yacht on earth but be below the poverty line if the middle and upper classes live in space stations or on other planets with higher standards of living.

An impoverished person in the US is rich compared to an impoverished person in India or Africa.

dnautics · on Sept 17, 2021

That doesn't make my statement any less true.

Too · on Sept 18, 2021

Source? According to the book Factfulnes by Hans Rosling, poverty has decreased significantly the past decades.

https://www.amazon.com/Factfulness-Reasons-World-Things-Bett...

wyager · on Sept 17, 2021

And why sending everyone to college hasn’t achieved anything good.

MattGaiser · on Sept 17, 2021

The ignoring of this law in software development is referred to as Scrum.

colechristensen · on Sept 17, 2021

The only systems that escape goodhearts or similar laws are those that have either A) people who genuinely care about quality and have the judgment for it or B) systems where the mechanics of how the control variable effects the system are well understood. (i.e. no black boxes)

leephillips · on Sept 17, 2021

I think the only way to apply a metric while avoiding the consequences of the Law is to keep the metric secret. As soon as the subjects are aware of the metric, they will try to optimize it, rather than the real performance that the metric is supposed to indicate.

There is a close connection with security, say screening at airports. If you fail to keep your screening criteria secret, the terrorists will simply ensure that they do not match the criteria. One way to thwart this is through randomness. There is a long public debate between Sam Harris and Bruce Schneier where the latter tried in vain to explain this to the former, who insisted that it was a waste of resources to search little old ladies. If one of your metrics is “don’t search little old ladies”, the terrorists will discover this through time by observation. The next bomb will be carried by a little old lady.

rfreytag · on Sept 17, 2021

Earlier posts here (134 comments): https://news.ycombinator.com/item?id=17320640

and here (58 comments): https://news.ycombinator.com/item?id=23762526

Also NPR's Planet Money (audio) also covered this interviewing Goodhart himself: https://www.npr.org/sections/money/2018/11/19/669395064/epis...

Beldin · on Sept 17, 2021

Inspired by a scandalous fraud case in scientific publishing, i wrote a paper applying Goodhart's Law to scientific publishing ("A Much-needed Security Perspective on Publication Metrics", published at the Security Protocols Workshop 2017). That was a really fun paper to write! Basically, how can you systematically start cheating at publishing - and how could you catch that?

The most fun was challenging the audience - security researchers all - to think even more outside the box than usual for them.

I'm still (slowly) forging ahead on ideas spawned by this paper. Bringing the ideas of catching crooks to reality was not as straightforward as hoped. Then again, when has any project ever gone as planned?

bedhead · on Sept 17, 2021

I do investment stuff for a living and this was one of the single-most important things I ever learned.

lisper · on Sept 17, 2021

It is possible to turn this effect to your advantage. I wrote a spam filter that takes advantage of signals that would be easy for spammers to spoof (like the list-id header), but no one spoofs them because no one but me uses this approach to spam filtering. So, ironically, if more people used my spam filter, it would probably stop working as well as it does now.

_greim_ · on Sept 17, 2021

What's the takeaway? How can statistics inform action, if such action invalidates those statistics?

Jtsummers · on Sept 17, 2021

That we can't stop thinking. It becomes too easy to stop using our own judgement when we have numbers to back up our decisions. We can point to he analysis and say, "Look, we've improved <metric>!". That itself becomes the justification for the actions taken, regardless of their actual sensibility.

This is an area where discipline remains essential, and maintaining discipline is a constant battle.

castlecrasher2 · on Sept 17, 2021

That statistics should inform action or measure of efficacy, but not drive it.

A dev manager running on only numbers will inevitably get empty, meaningless values such as tickets resolved or lines of code written, while the polar opposite manager will run on intuition alone. I imagine most would agree neither type is generally effective and a balance should be struck, and Goodhart's Law means you should be aware of what's important, pay attention to it, but do not make it your sole focus. And for God's sake, don't make a public dashboard for it.

ItsMonkk · on Sept 17, 2021

You need to measure the KPIs.

The reason we use metrics is because things have scaled out of control, and using a "real" judgment system is no longer possible. Not for everyone, anyway.

Using hiring as an example, but this should work for nearly all metrics driven workloads. Hire a small subset of your staff using a real judgment system, hire most of your staff using metrics, then take a sample of those hired using metrics and take a real look at them compared against those hired using real judgments.

If they are reasonably close, the KPIs are working. If they are alien to each-other, you need to either stop using KPIs or alter them significantly to fit with what the more effective people are doing.

rhizome · on Sept 17, 2021

>You need to measure the KPIs.

What KPIs do you use when the statistics are being used to measure whether the KPIs are the right ones?

cryptica · on Sept 17, 2021

The take away is that governments and large institutions which impact large numbers of people should never attempt to set quantifiable targets and should never attempt to meet quantifiable targets.

equality_1138 · on Sept 18, 2021

Reminded me also of the Taguchi loss function. To paraphrase, quality is optimized when you manage the variation of a measurement in relation to its target value, instead of emphasizing the acceptable limits. It was seen as an improvement over the "goalpost" approach. And while you can say it really applies in manufacturing context, I think it should be considered more broadly for goal-oriented measurement and engineering in general.

https://en.wikipedia.org/wiki/Taguchi_loss_function

brightball · on Sept 17, 2021

One of the most important lessons to promote.

starnger · on Sept 17, 2021

I could not help but relate it to Heisenberg uncertainty principle.

thekhatribharat · on Sept 17, 2021

me too :)

azhenley · on Sept 17, 2021

I wrote a blog post a few months ago about Goodhart's Law in my life, titled "Gamification, life, and the pursuit of a gold badge".

https://web.eecs.utk.edu/~azh/blog/gamification.html

The Tyranny of Metrics is a good book that covers real-world cases of metrics gone wrong.

paulorlando · on Sept 17, 2021

Good that this topic gets some attention. I see Goodhart's Law again and again in metrics. I wrote about this a while back, including why we use the misleading name. https://unintendedconsequenc.es/new-morality-of-attainment-g...

cratermoon · on Sept 17, 2021

When I explain the concept to people I usually call it the Goodhart-Strathern principle, to recognize the generalization she contributed and acknowledge the author of the most-commonly quoted form of the law: "When a measure becomes a target, it ceases to be a good measure"

vagab0nd · on Sept 17, 2021

Here's another interesting one that's somewhat related:

https://en.wikipedia.org/wiki/Decline_effect

rfreytag · on Sept 17, 2021

Could be...yes.

"Decline effect" could also be mostly due to the https://en.wikipedia.org/wiki/Replication_crisis

Goodhart's Law starts out as effective and the social system it purports to measure adapts (some might say 'distorts'), till the measure not longer serves its original purpose.

cs702 · on Sept 17, 2021

The same phenomenon has many different names. From the OP:

> See also:

> Campbell's law – "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures" https://en.wikipedia.org/wiki/Campbell%27s_law

> Cobra effect – when incentives designed to solve a problem end up rewarding people for making it worse https://en.wikipedia.org/wiki/Cobra_effect

> Gaming the system https://en.wikipedia.org/wiki/Gaming_the_system

> Lucas critique – it is naive to try to predict the effects of a change in economic policy entirely on the basis of relationships observed in historical data https://en.wikipedia.org/wiki/Lucas_critique

> McNamara fallacy – involves making a decision based solely on quantitative observations (or metrics) and ignoring all others https://en.wikipedia.org/wiki/McNamara_fallacy

> Overfitting https://en.wikipedia.org/wiki/Overfitting

> Reflexivity (social theory) https://en.wikipedia.org/wiki/Reflexivity_(social_theory)

> Reification (fallacy) https://en.wikipedia.org/wiki/Reification_(fallacy)

> San Francisco Declaration on Research Assessment – 2012 manifesto against using the journal impact factor to assess a scientist's work https://en.wikipedia.org/wiki/San_Francisco_Declaration_on_R...

> Volkswagen emissions scandal – 2010s diesel emissions scandal involving Volkswagen https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal

Source: https://en.wikipedia.org/wiki/Goodhart%27s_law#See_also

nicodds · on Sept 17, 2021

It is like quantum mechanics: the measurement process produces a perturbation of the physical system

ausbah · on Sept 19, 2021

sounds like SOTA ml research