> Simple nudity does not count inherently. That’s not entirely true. If a police...

0xy · on Aug 5, 2021

>That’s not entirely true

That's why I said it's not inherently illegal. Of course, if you have a folder called "porn" that is full of naked children it modifies the context and therefore the classification. But, if it's in a folder called "Beach Holiday 2019", it's not illegal nor really morally a problem. I'm dramatically over-simplifying of course. "It depends" all the way down.

>That’s a bold blanket statement

You're right, I shouldn't have been so broad. It's possible but unlikely, especially if it's not shared on social media.

It reinforces my original point however, because I can easily see a case where there's a totally voluntary nudist family who posts to social media getting caught up in a damaging investigation because of this. If their pictures end up in the possession of unsavory people and gets lumped into NCMEC's database then it's entirely possible they get flagged dozens or hundreds of times and get referred to police. Edge case, but a family is still destroyed over it. Some wrongfully accused people have their names tarnished permanently.

This kind of policy will lead to innocent people getting dragged through the mud. For that reason alone, this is a bad idea.

jsjohnst · on Aug 5, 2021

> But, if it's in a folder called "Beach Holiday 2019", it's not illegal nor really morally a problem.

With all due respect, please please stop making broad blanket statements like this. I'm far from a LEO/lawyer, yet I can think of at least a dozen ways a folder named that could be illegal and/or immoral.

> This kind of policy will lead to innocent people getting dragged through the mud. For that reason alone, this is a bad idea.

Completely agree.

meowface · on Aug 5, 2021

I believe both you and the other poster, but I still haven't seen anyone give an example of a false positive match they've observed. Was it an actual image of a person? Were they clothed? etc.

It's very concerning if the fuzzy hash is too fuzzy, but I'm curious to know just how fuzzy it is.

jsjohnst · on Aug 6, 2021

> Was it an actual image of a person? Were they clothed?

Some of the false positives were of people, others weren’t. It’s not that the hashing function itself was problematic, but that the database of hashes had hashes which weren’t of CP content, as the chance of a collision was way lower than the false positive rate (my guess is it was “data entry” type mistakes by NCMEC, but I have no proof to back up that theory). I made it a point to never personally see any content which matched against NCMEC’s database until it was deemed “safe” as I didn’t want anything to do with it (both from a disgusted perspective and also from a legal risk perspective), but I had coworkers who had to investigate every match and I felt so bad for them.

In the case of PhotoDNA, the hash is conceptually similar to an MD5 or a SHA1 hash of the file. The difference between PhotoDNA and your normal hash functions is that it’s not an exact hash of the raw bytes, but rather more like the “visual representation” of the image. When we were doing the initial implementation / rollout (I think late 2013ish), I did a bunch of testing to see how much I could vary a test image and have the hash be the same as I was curious. Resizes or crops (unless drastic) would almost always come back within the fuzziness window we were using. Overlaying some text or a basic shape (like a frame) would also often match. I then used photoshop to tweak color/contrast/white balance/brightness/etc and that’s where it started getting hit or miss.

OJFord · on Aug 5, 2021

There are examples (from the OP, but in reply tweets) in the submission.

meowface · on Aug 6, 2021

Unless I'm missing something, those are just theoretical examples of how one could potentially deliberately try to find hash collisions, using a different, simpler perceptual hash function: https://twitter.com/matthew_d_green/status/14230842449522892...

So, it's theoretical, it's a different algorithm, and it's a case where someone is specifically trying to find collisions via machine learning. (Perhaps by "reversing" the hash back to something similar to the original content.)

The two above posters claim that they saw cases where there was a false positive match from the actual official CSAM hash algorithm on some benign files that happened to be on a hard drive; not something deliberately crafted to collide with any hashes.

OJFord · on Aug 6, 2021

You're not missing something, but you're not likely to get real examples because as I understand it the algorithm and database are private, the posters above are just guardedly commenting with (claimed) insider knowledge, they're not likely to want to leak examples (and not just that it's private, but with the supposed contents.. Would you really want to be the one saying 'but it isn't, look'? Would you trust someone who did, and follow such a link to see for yourself?)

meowface · on Aug 6, 2021

To be clear, I definitely didn't want examples in terms of links to the actual content. Just a general description. Like, was a beach ball misclassified as a heinous crime, or was it perfectly legal consensual porn with adults that was misclassified, or was it something that even a human could potentially mistake for CSAM. Or something else entirely.

I understand it seems like they don't want to give examples, perhaps due to professional or legal reasons, and I can respect that. But I also think that information is very important if they're trying to argue a side of the debate.

jsjohnst · on Aug 7, 2021

> Just a general description.

I gave that above in a sibling thread.

> I understand it seems like they don't want to give examples, perhaps due to professional or legal reasons, and I can respect that.

In my case, it’s been 7 years so I’m not confident enough of my memory to give a detail description of each false positive. All I can say is that the photos that were false positive that included people were either very obviously fully clothed and doing something normal, or the photo was of something completely innocuous all together (I seem to remember an example of the latter was the Windows XP green field stock desktop wallpaper, but I’m not positive on that).