We do indeed measure improvements against more than one website. We wouldn’t want to improve website X while regressing the rest of the internet. See https://v8.dev/blog/real-world-performance (from 2016):
> We now monitor changes against a test suite of approximately 25
> websites in order to guide V8 optimization. In addition to the
> aforementioned websites and others from the Alexa Top 100, we selected
> sites which were implemented using common frameworks (React, Polymer,
> Angular, Ember, and more), sites from a variety of different geographic
> locales, and sites or libraries whose development teams have
> collaborated with us, such as Wikipedia, Reddit, Twitter, and webpack.
> We believe these 25 sites are representative of the web at large and
> that performance improvements to these sites will be directly reflected
> in similar speedups for sites being written today by JavaScript
> developers.
I thought they have like 10.000 websites they measure