"After running the hashes against 100 million non-CSAM images, Apple found three false positives"
So closer to 1/10M. The reporting threshold is made artificially higher by requiring more than one positive.
But anyway, that's beside the point.
A perceptual hash is not uniformly distributed; it's not a random number. Likewise for photos taken in a specific setting; they do not approach the randomness of a set of random images.
So someone snapping a photos in a setting that has features similar to a set of photos in the CSAM database may risk a massively higher false positive rate. It's no longer a million sided dice, it could be a thousand sided dice when your outputs happen to be clustered around similar values due to similar setting.
But I can't say I care about false positives. To me the system is bad either way.
"After running the hashes against 100 million non-CSAM images"
They don't say what kind/distribution of non-CSAM images. Landscapes? Parent pix of kids in the bathtub? Cat memes? Porn of young adults? Photos from real estate listings?
I suspect some pools of image types would have a much higher hit rate.
Edit: And, well "hot dog / not hot dog" is impressive on a set of random landscapes too.
Well the same article also claims zero false positives for "a collection of adult pornography." I don't know if the size of that collection is mentioned anywhere.
Anyway, I suspect that the algo is more likely to pick defining features of the scene and overall composition (furniture, horizon, lighting, position & shape of subject and other objects) more than the subject matter itself.
So closer to 1/10M. The reporting threshold is made artificially higher by requiring more than one positive.
But anyway, that's beside the point.
A perceptual hash is not uniformly distributed; it's not a random number. Likewise for photos taken in a specific setting; they do not approach the randomness of a set of random images.
So someone snapping a photos in a setting that has features similar to a set of photos in the CSAM database may risk a massively higher false positive rate. It's no longer a million sided dice, it could be a thousand sided dice when your outputs happen to be clustered around similar values due to similar setting.
But I can't say I care about false positives. To me the system is bad either way.