Hacker News new | past | comments | ask | show | jobs | submit login

"After running the hashes against 100 million non-CSAM images, Apple found three false positives"

So closer to 1/10M. The reporting threshold is made artificially higher by requiring more than one positive.

But anyway, that's beside the point.

A perceptual hash is not uniformly distributed; it's not a random number. Likewise for photos taken in a specific setting; they do not approach the randomness of a set of random images.

So someone snapping a photos in a setting that has features similar to a set of photos in the CSAM database may risk a massively higher false positive rate. It's no longer a million sided dice, it could be a thousand sided dice when your outputs happen to be clustered around similar values due to similar setting.

But I can't say I care about false positives. To me the system is bad either way.




"After running the hashes against 100 million non-CSAM images"

They don't say what kind/distribution of non-CSAM images. Landscapes? Parent pix of kids in the bathtub? Cat memes? Porn of young adults? Photos from real estate listings?

I suspect some pools of image types would have a much higher hit rate.

Edit: And, well "hot dog / not hot dog" is impressive on a set of random landscapes too.


Well the same article also claims zero false positives for "a collection of adult pornography." I don't know if the size of that collection is mentioned anywhere.

Anyway, I suspect that the algo is more likely to pick defining features of the scene and overall composition (furniture, horizon, lighting, position & shape of subject and other objects) more than the subject matter itself.


That's why I included "Photos from real estate listings?" in my list.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: