I think that this metric correlates well with the browser being usable on real w...

fabrice_d · 2025-10-06T18:30:27 1759775427

The wpt score is not that well balanced. Look at https://staging.wpt.fyi/results/?product=servo&product=ladyb... : out of about 2 million tests, more than half are for the "encoding" category. Good encoding support is needed for sure, but likely not at that level of prevalence.

jerf · 2025-10-06T19:16:28 1759778188

It seems my communication did not adequately convey the fact that I have no problem with the Ladybird team doing this. It makes perfect sense and is the right thing to do.

However, a jump like that means precisely and exactly what I said it means; very suddenly, that metric became much more important to the team. It is written straight into the graph.

A large number of encoding-related tests that were probably relatively easy to fix in bulk is certainly a plausible explanation.

A lot of people are imputing to me assumptions that they are bringing to my post, such as assuming that such improvements must be fake or bad or somehow otherwise cheating. Nope. Moreover, if you are thinking that, don't take it up with me, go take it up with the graph directly. It's not my graph. I'm just amused at the spectacular demonstration of Goodhart's Law.

Are the commentators who think I'm being critical of the Ladybird project going to defend their implicit proposition that the browser got twice as good in whatever period that graph is in, a week or a month or whatever? Of course that's not the case.

trflynn89 · 2025-10-06T20:10:11 1759781411

> However, a jump like that means precisely and exactly what I said it means; very suddenly, that metric became much more important to the team. It is written straight into the graph.

Not really, though. The latest jump was from implementing some CSS Typed OM features, which has been in-progress work for a while now. The 6k increase in the test score was a bit of a happy surprise. It's also not that much of a jump when you zoom out and see it's "just" a continuation of a steady increase in score over a long period.

fabrice_d · 2025-10-06T20:06:01 1759781161

fwiw, I'm not imputing you any assumptions. I'm just pointing out that using wpt score as a criteria is not necessarily a good proxy for browser readiness. So I'm not sure why Apple uses that, other than... there's no other objective measure? Of course it's fair game for browser engines to improve their score!

tolerance · 2025-10-06T20:09:43 1759781383

Dude at this point just raise your knickers up and criticize the thing. You’ve got the most valuable observation about this topic on your side. The graph is jarring and for someone only recently made familiar with Goodhart’s Law this is a great example of it in practice. You must be further well-informed enough to defend any issues you actually have with the project outright instead of this small war of attrition playing out waaay down here.

Too much useful insight is withheld or misappropriated these days.