Although the Moscow demo[0] look better (in my eyes) in jxl than wepb.
Note also that JXL is still being worked on. We have no information also about which encoder was used (I am assuming the reference encoder, since it is the only one that I know off right now) and which version.
Edit: the Citroën demo is also a clear win for JXL.
The singer’s left hand has wrinkles in the original image that disappear in WebP2.
Overall, WebP2 and especially AVIF are really good at very low bitrates (<1 bit per pixel), but unlike video, images on the Web will always be shown at the smallest bitrate necessary to be indistinguishable from the original; there, JXL tends to show all the details at a lower bitrate.
JXL's design operating point is "no visible compression artefacts".
Most people do not want to have visible compression artefacts on the images they put on their web pages.
JXL starts from this premise and tries to answer the question: "How small can we then make the image?"
Care must be taken when trying to compare the performance of image codecs by increasing compression density until there are very visible compression artefacts and then evaluating whether A's or B's artefacts look worse: If both A's and B's artefacts are so bad that one would not want to put such an image on one's website, such an experiment gives no insight on what one would pick for images that one would actually put on one's website.
Figuratively speaking, if I buy a shirt, my main criterion is that it looks good in good condition, and not that it still looks good if I put a coffee stain on it.
So, before comparing codec quality at compression levels where artefacts show, always ask yourself: At that level of visual quality, would I actually want to put either of the two options on my website? Now, it is of course tempting to compare "away from the actual operating point", because it is just so much easier to do comparisons if there are very visible differences. Comparing near-identical images for quality is hard. Doing this over and over again in a human rater experiment is exhausting. But that's then answering the actual performance questions that need to be answered.
Comparing artifacts at 0.2 bpp is tempting because the artifacts are big there. But it's like buying a car based on how it performs when you are using only the first gear.
This blog, linked from the original article goes into this idea in detail, describing how JXL was focused on improving compression of high fidelity images, and has not focused as much on the appeal of highly compressed images.
eg. https://eclipseo.github.io/image-comparison-web/#us-open&WEB... The player is missing the corner of her mouth in the JXL version.
JXL Medium (32KB) is about the same quality as WebP Small (19KB)