"After running the hashes against 100 million non-CSAM images, Apple found three...

tyingq · on Aug 25, 2021

"After running the hashes against 100 million non-CSAM images"

They don't say what kind/distribution of non-CSAM images. Landscapes? Parent pix of kids in the bathtub? Cat memes? Porn of young adults? Photos from real estate listings?

I suspect some pools of image types would have a much higher hit rate.

Edit: And, well "hot dog / not hot dog" is impressive on a set of random landscapes too.

foxfluff · on Aug 25, 2021

Well the same article also claims zero false positives for "a collection of adult pornography." I don't know if the size of that collection is mentioned anywhere.

Anyway, I suspect that the algo is more likely to pick defining features of the scene and overall composition (furniture, horizon, lighting, position & shape of subject and other objects) more than the subject matter itself.

tyingq · on Aug 25, 2021

That's why I included "Photos from real estate listings?" in my list.