A catalog of naturally occurring images whose Apple NeuralHash is identical

foxfluff · on Aug 25, 2021

My take on this is that the system is by and large useless.

It won't catch anything but the dumbest of dumb criminals, because those who care about CSAM can surely figure out a better way to share images, or find a way to obfuscate their images enough to bypass the system (the lower the false positive rate, the easier it must be to trick the system).

So what's left when all the criminals this is supposed to catch have figured it out?

False positives. Only false positives.

Is it really worth turning personal devices into snitches that don't even do a good job of protecting children?

Also, numbers about false positives must be taken with a grain of salt because of the non-uniform distribution of perceptual hashes. It might be that your random vacation photos and kitty pics have a 1-in-a-million chance of a fapo, but someone who happens to (say) live in an apartment that has been laid out very similarly to a scene in pictures appearing in the CSAM database may have a massively higher chance of fapos for photos taken in their home.

tzs · on Aug 25, 2021

> It won't catch anything but the dumbest of dumb criminals

Dumb is a pretty accurate description of a large fraction of criminals. For the most part you only get smart criminals when you are talking about crimes where you have to be smart to even plan and carry out the crime.

BiteCode_dev · on Aug 25, 2021

Given the average user don't know what a url is and a pedophile can use the darknet, I'd say criminals are not all dumb.

nicce · on Aug 25, 2021

It is not so easily comparable. It’s all about your intrests and expectations.

Darknet might sound bit complex, but as darknet user, you literally just install different browser.

pope_meat · on Aug 25, 2021

Something like 40 percent of people use the default browser installed by the os.

We're just in an echo chamber of people who know what JavaScript is, and that distorts our perception of the world.

nicce · on Aug 25, 2021

Can you give source for that number? Regardless, browser is like any other app. If that amount of people don’t know how to install apps on their computers, then we have a either real dump people (or just lack of motivation) or great UX design failure in general.

BiteCode_dev · on Aug 26, 2021

Users are on average incompetent, not dumb.

It's a dauntins realisation that sinks in once you have to do support for a web site or app catering to the general population instead of a niche.

They don't read, don't know the diff between an app and a web site, don't know right click or drag and drop, think google or chrome is the internet and overall their startegy to solve any problem is as follow:

- look for something obvious that seems like the answer but is not scary

- click

- wait for it

- repeat 3 times until ok or give up and call someone or get angry or both

Working on a streaming video site really opened my eyes on this one. Most tickets we received were insults, some were incomprehensible garbage, a few were actionable request from someone not understanding anything about their computer.

This is nothing like your github ticket. Your parent number are being generous IMO.

xtiansimon · on Aug 27, 2021

Apple seems to have completely botched this PR stunt/feature.

Reading your comment, I realize how these… ‘criminals’ could use phone number networks to share illegal sexual content peer to peer.

In other words, Apple doesn’t need to analyze your images to find these criminals. They only need to analyze the frequency or quantity of flagged images.

In other words not one image correctly/falsely tagged, but individuals and networks of individuals who are *collecting* and *storing* mass quantities of these images. And, they’re using Apple privacy and security to hide from law enforcement.

Racketeering?

nullc · on Aug 25, 2021

Yes, but when you admit that the target is just the dumb criminals, then why adopt a scheme that has false positives?

Decompress and downsample. Drop the least significant bit or two, maybe do it in the dct domain instead. SHA256. It'll preserve matching for at least some cases of recompression and downsampling. But finding an unrelated image that matches is as hard as attacking SHA256, the only false positives that could be found would be from erroneous database entries.

sabellito · on Aug 25, 2021

> Dumb is a pretty accurate description of a large fraction of criminals.

Is there any reading on that? I'd love it to be true.

r3trohack3r · on Aug 25, 2021

Selection bias? It _might_ be a pretty accurate description of a large fraction those who _get caught_.

nicce · on Aug 25, 2021

So we must compare those statistics for unsolved mysteries to find the truth and correlation.

solveit · on Aug 25, 2021

For the best criminals, you wouldn't even know that there's an unsolved mystery to be solved.

nicce · on Aug 25, 2021

Is there even a problem in such a scenario?

nsp · on Aug 25, 2021

With the caveat already noticed for selection bias (maybe the smart criminals never get caught?), there's definitely some evidence to support this. https://law.jrank.org/pages/1363/Intelligence-Crime-Measurin...

throwaway0a5e · on Aug 25, 2021

You only hear about the criminals who get caught and crimes that go unsolved get blamed on these kinds of criminals.

prirun · on Aug 25, 2021

> Is it really worth turning personal devices into snitches that don't even do a good job of protecting children?

Yes, because the point is not to protect children. It's to get everyone used to the idea that their content is being monitored. Once that is accomplished, other forms of monitoring can and will be added.

panta · on Aug 25, 2021

Exactly. It's a Trojan Horse (https://en.wikipedia.org/wiki/Trojan_Horse) to make more pervasive individual control the new normality. The current motivations are just a pretext.

Retric · on Aug 25, 2021

Perceptual hashes are only used to reduce the search space for human review. Apple doesn’t have images in the CSAM database to do a comparison, but if it’s just a picture of a door their going to reject it. Also, because human review is an expense Apple’s incentives are to minimize the number of times it happens, thus the requirement for multiple collisions.

woofie11 · on Aug 25, 2021

I don't really want my family photos reviewed by strangers. "Reducing the search space" of photos on my phone isn't an outcome I want to live with. At the time someone is looking at photos of my, my wife/husband/girlfriend/boyfriend, and my kids, they'd better have a darned good reason (e.g. a search warrant).

I'd also appreciate if Apple let me know if my false positives were reviewed and found to not be CASM.

sroussey · on Aug 25, 2021

Don’t upload an image anywhere, else it can be reviewed.

pope_meat · on Aug 25, 2021

I saw a story on here yesterday about iphones resetting to default settings after restarting. So people were turning off backups to the cloud, and then finding that their device turned the feature on after sometime.

woofie11 · on Aug 25, 2021

The whole point of Apple's system is that I don't need to upload an image anywhere.

Images from my phone can be stolen and reviewed with no due process, based on proprietary Apple technology.

partdavid · on Aug 25, 2021

The system as described only submits its safety vouchers when photos are uploaded to iCloud.

Not saying it will stay that way, but there are three distinct realms of objection to this system, and it's probably useful to separate them:

1. Objections that in the future, something different will happen with the technology, system, or companies; so that even if the system is unobjectionable now, we should object because of what it might be used for in the future; or how it might change. 2. Objections that Apple can't be trusted to do what they say they are doing, so that even if they say they will only refer cases after careful manual review, or that they will submit images for review that were not uploaded to iCloud, we can't believe them, so we should object. 3. Objections that hold for the system as designed and promised; in other words, even if all the actors do what they say they are doing in good faith and this monitoring never expands, it's still bad.

People who have the third kind of objection need to deal with the fact that Apple is basically putting in a system with more careful safeguards than are already in place in many Internet services, even for their "private" media storage or exchange. You likely don't know how the services you use are scanning for CSAM but if the service is at all sizeable (chat, mail, cloud storage) it's likely using PhotoDNA or something similar.

I think there are valid objections on all three bases. But there's a difference in saying "this is bad because of something that might happen" and "this is bad because of what is actually happening".

burnished · on Aug 25, 2021

I think the issue is that the content review is happening on phone, and would be a small change to go from scanning uploaded photos to all photos

sroussey · on Aug 25, 2021

Oh yes, I agree. We will see a change in privacy policy before that happens. And Apple will lose a lot of us if that comes to pass.

For many years, it happened in the cloud. Soon it will happen on device and send a message about which item in the cloud is an issue.

I think it’s all about apple moving ML jobs (like Siri) to device to lighten the load on their datacenters.

jdavis703 · on Aug 25, 2021

> Apple’s incentives are to minimize the number of times it happens, thus the requirement for multiple collisions.

How can we be sure they won’t cut costs by increasing worker load? I could see them giving each reviewer less time to review individual pictures before passing it on to law enforcement.

fraa-orolo · on Aug 25, 2021

We can't and they probably will, everyone else seems to already be doing so. There's that Swiss federal police report that only about 10% of NCMEC reports are actually relevant (https://fedpol.report/en/fedpol-in-figures/fight-against-pae...)

Tagbert · on Aug 25, 2021

If they pass false positives to authorities that will open them up to legal action.

zionic · on Aug 25, 2021

Apple's human review is largely useless.

Trolls will be able to easily use tools slightly modify ambiguous adult porn to collide with a "known CP hash".

A human reviewer will see a blurry grayscale derivative of adult pornographic content and hit "report" every time.

at_a_remove · on Aug 25, 2021

This is the threat model I am looking at. It is number one with a bullet. We have already had a court case where an adult actress had to show up in court and prove that she was adult when experts testified that the images were of a non-adult woman.

Baby in the sink? No. But a bunch of the aforementioned? Yeah.

nullc · on Aug 25, 2021

> Perceptual hashes are only used to reduce the search space for human review.

False. The Apple proposed system leaks the cryptographic keys needed to decode the images conditional on the match (threshold of matches) of the faulty neuralhash perceptual hash.

Matching these hashes results in otherwise encrypted highly confidential data being decodable by apple, accessable on their servers to the relevant staff along with anyone who compromises them or coerces them.

Retric · on Aug 25, 2021

Apple can decode the data either way. Their the ones doing the encryption on their servers.

There are two basic reasons for this first it’s a backup service which makes end to end encryption risky, but second they also let users share access to their baked up photos. iCloud > photos > shared album.

wyager · on Aug 25, 2021

Edit: I incorrectly claimed there wasn’t manual review - see below

xadhominemx · on Aug 25, 2021

What is the basis for your understanding?

wyager · on Aug 25, 2021

Apologies, I was mistaken:

“ Only when the threshold is exceeded does the cryptographic technology allow Apple to interpret the contents of the safety vouchers associated with the matching CSAM images. Apple then manually reviews each report to confirm there is a match”

The design goal was no human review for individual matches.

wpietri · on Aug 25, 2021

I knew a probation officer for sex offenders. They told me that most of them were quite dumb. What the repeat offenders were, though, is dedicated. They had all day to try to avoid getting caught, and the PO had a few minutes per week per offender.

It's true that in any arms race, a given advance gets adapted to. This will surely catch a bunch of people up front and then a pretty small number thereafter as the remainder learn to avoid iPhones. But that's how arms races work. You could say that about almost any advance in fighting CSAM.

woofie11 · on Aug 25, 2021

I think it's only the dumb ones who get caught.

Source: I've met a few white collar criminals.

wpietri · on Aug 25, 2021

Probably some of both. One point of the criminal justice system is to shift incentives such that people with their acts together satisfy their desires without criming. There are plenty of smart, greedy people who just go get an MBA and siphon off value in ways that are technically legal. The risk-adjusted ROI is better.

mayoff · on Aug 25, 2021

> It won't catch anything but the dumbest of dumb criminals, because those who care about CSAM can surely figure out a better way to share images

Apparently that better way is by using Facebook. Facebook made 20.3 million reports to NCMEC in 2020.

https://www.missingkids.org/content/dam/missingkids/gethelp/...

foxfluff · on Aug 25, 2021

Yeah, Facebook's blog post makes me wonder what all the stuff they report actually is. When people say CSAM, I think "kids getting raped" but apparently there's stuff that people find humorous or outrageous and spread it like a meme (and not like pornography).

"We found that more than 90% of this content was the same as or visually similar to previously reported content. And copies of just six videos were responsible for more than half of the child exploitative content we reported in that time period."

"we evaluated 150 accounts that we reported to NCMEC for uploading child exploitative content in July and August of 2020 and January 2021, and we estimate that more than 75% of these people did not exhibit malicious intent (i.e. did not intend to harm a child). Instead, they appeared to share for other reasons, such as outrage or in poor humor (i.e. a child’s genitals being bitten by an animal)."

Based on this, I wouldn't conclude that FB is the platform where people pedos go share their stash of child porn.

Their numbers also include Instagram, which I believe is quite popular among teenagers? I wonder how likely it is for teens' own selfies and group pics get flagged and reported to NCMEC.

(https://about.fb.com/news/2021/02/preventing-child-exploitat...)

nullc · on Aug 25, 2021

> Facebook made 20.3 million reports to NCMEC in 2020.

Which appears to have resulted in what... 5 prosecutions?

UncleMeat · on Aug 25, 2021

> It won't catch anything but the dumbest of dumb criminals, because those who care about CSAM can surely figure out a better way to share images, or find a way to obfuscate their images enough to bypass the system (the lower the false positive rate, the easier it must be to trick the system).

Given the reported numbers of illegal images detected by similar systems within Facebook and Google, I think it is very clear that this will catch a lot of illegal content.

zionic · on Aug 25, 2021

Facebook and google are not catching 20m people a year, they're mostly flagging and removing tor/proxy-based throwaway accounts.

volta83 · on Aug 25, 2021

The false positive rate reported in the blogpost for imagenet was 1 in a trillion, and the author concludes that this algorithm is better than they expected.

foxfluff · on Aug 25, 2021

"After running the hashes against 100 million non-CSAM images, Apple found three false positives"

So closer to 1/10M. The reporting threshold is made artificially higher by requiring more than one positive.

But anyway, that's beside the point.

A perceptual hash is not uniformly distributed; it's not a random number. Likewise for photos taken in a specific setting; they do not approach the randomness of a set of random images.

So someone snapping a photos in a setting that has features similar to a set of photos in the CSAM database may risk a massively higher false positive rate. It's no longer a million sided dice, it could be a thousand sided dice when your outputs happen to be clustered around similar values due to similar setting.

But I can't say I care about false positives. To me the system is bad either way.

tyingq · on Aug 25, 2021

"After running the hashes against 100 million non-CSAM images"

They don't say what kind/distribution of non-CSAM images. Landscapes? Parent pix of kids in the bathtub? Cat memes? Porn of young adults? Photos from real estate listings?

I suspect some pools of image types would have a much higher hit rate.

Edit: And, well "hot dog / not hot dog" is impressive on a set of random landscapes too.

foxfluff · on Aug 25, 2021

Well the same article also claims zero false positives for "a collection of adult pornography." I don't know if the size of that collection is mentioned anywhere.

Anyway, I suspect that the algo is more likely to pick defining features of the scene and overall composition (furniture, horizon, lighting, position & shape of subject and other objects) more than the subject matter itself.

tyingq · on Aug 25, 2021

That's why I included "Photos from real estate listings?" in my list.

numbsafari · on Aug 25, 2021

Sometimes the best way to catch the really smart or sophisticated criminals is to exploit their less smart and less sophisticated accomplices, co-conspirators, peers, acquaintances, or even their victims.

devmor · on Aug 25, 2021

The point of these innovations is never the stated purposes. To catch criminals is an excuse. I would bet a great deal that this system is by and large pressured by state actors for the purpose of creating a new political surveillance tool.

ak391 · on Aug 25, 2021

can try a web demo of it here on huggingface https://huggingface.co/spaces/akhaliq/AppleNeuralHash2ONNX

woofie11 · on Aug 25, 2021

> False positives. Only false positives.

I really doubt this. In the long term, a few people Apple wants to frame will surely slip into the mix. If Apple didn't want Trump to win, a CASM flag a week before the election might do it.

user-the-name · on Aug 25, 2021

> It won't catch anything but the dumbest of dumb criminals

This includes the vast majority of pedophiles.

rowanG077 · on Aug 25, 2021

Do you have any source that pedophilia correlates very strongly with low intelligence?

foxfluff · on Aug 25, 2021

Where did you find the statistics about pedophiles' intelligence?

roody15 · on Aug 25, 2021

Apple has yet to make a valid reason for implementing client side CSAM scanning.

According to Apple only images that will be uploaded to iCloud will be scanned.

If this is the case there is zero reason to scan locally and you can just scan the uploaded image once it is on the server.

Apple has not implemented E2E nor has it released a statement indicating this will be implemented in the future.

simondotau · on Aug 25, 2021

There is an interesting constitutional quirk which arises from the scanning being done client side, specifically for US citizens. If the US Government forced Apple to add other entries to the hash table, this would constitute a warrantless Government search of the private physical property of US citizens. This is a clear-cut, unambiguous breach of the 4th Amendment.

Whereas if the CSAM scanning was performed exclusively in the cloud, protection under the 4th Amendment does not exist as it would likely fall under the third party doctrine.

Now I'm not saying the US Government would let mere unconstitutionality get in the way of any surveillance program. But Apple would. You don't think Apple wouldn't be itching for another opportunity to flex in public? Especially now, with their reputation on the line? Apple would love nothing more than to have more opportunities like they got with the San Bernardino iPhone.

jrockway · on Aug 25, 2021

Apple could also encrypt every upload to iCloud, and not have any scanning on the client, and still be able to say to the government "sure, you can have the files; we can't read them and neither can you". Apple wants to reduce your privacy from the government above and beyond what the law requires. The questions is: why?

roody15 · on Aug 25, 2021

The answer may be related to Pegasus software.

Timeline:

1. Leaked documents show Pegasus software exploits all iphones using an exploit in iMessage 2. Apple releases security update (doesn’t patch imessage. exploit) 3. Apple announces CSAM client scanning coming soon 4. Apple releases another security update (still leaves iMessage exploit unpatched and used by Pegasus)

….

Perhaps Apple is under pressure to provide a back door prior to patching a tool that may be widely used by governments around the world.

colejohnson66 · on Aug 25, 2021

The simple answer is: password resets. I’m sure majority of people would be very upset if they lost everything by forgetting a password.

nullc · on Aug 25, 2021

Doesn't the apple backup stuff work that way? The data is gone forever if you lose your password?

simondotau · on Aug 26, 2021

Some, but not all. And if you still have access to a Mac or iOS device which remains associated to this iCloud account, the amount that is lost can be even less.

alisonkisk · on Aug 26, 2021

Because Apple doesn't want to be a child porn storage service. This is not a secret.

jrockway · on Aug 26, 2021

That's not technically possible, though. Most tech companies don't take on impossible tasks, because it will take an infinite amount of money to realize it, and the shareholders will be bankrupt before the product is delivered.

Every piece of data is CSAM encrypted with a one-time pad. It's just that nobody knows the one-time pad.

rz2k · on Aug 25, 2021

I think you're implying that scanning of private personal property by a corporation without a warrant protects users from searches of their content in the cloud that is authorized by a warrant or national security letter. I don't understand the mechanism if there isn't end to end encryption, and I don't understand the mechanism if there is end to end encryption.

Scanning makes phones a greater threat, and also erodes the expectation of privacy that is a legal barrier to surveillance.

simondotau · on Aug 25, 2021

I did not intentionally imply anything here, so if anything I wrote appears to include an implied component, that was not intended and may not represent my opinion.

All I'm saying is that the implementation Apple has described would be constitutionally blocked from being co-opted by US law enforcement. Obviously if there's no end-to-end encryption, any cloud operator could still be coerced into searching for material server side, as that falls under the so-called third party doctrine.

jdavis703 · on Aug 25, 2021

What kind of non-CSAM crime could be detected with just a couple of hashes? Wouldn’t Apple need to reduce the similarity score in order to even get something close?

ok123456 · on Aug 25, 2021

Also Fifth Amendment. People are being compelled to testify against themselves by running this CSAM scanner. Apple's end-to-end encryption was sold to users as exactly that.

To all of a sudden introduce this scanner doesn't negate the expectation of privacy as that is how it was sold and marketed. There is an implied warranty of merchantability of how this service functions.

alisonkisk · on Aug 26, 2021

Are you a lawyer or legal scholar, or just guessing?

The government can't compel warrantless searches of Apple. 3rd party doctrine means Apple can search your iCloud, and can give it away if they choose. Same as how Apple can search your phone if you run their software, and can give away whatever they find if they choose.

rangerdan · on Aug 25, 2021

> If the US Government forced Apple to add other entries to the hash table, this would constitute a warrantless Government search of the private physical property of US citizens. This is a clear-cut, unambiguous breach of the 4th Amendment.

There's no reason not to assume this isn't already happening, being closed source and proprietary. The question to ask is, what are we going to do about it?

simondotau · on Aug 25, 2021

If you take that line of argument, you must also accept that you have no reason not to assume that binary distributions of Android and Windows haven't been doing similar things for the past decade.

shawnz · on Aug 25, 2021

I agree with you in that I don't think the problem is the closed-source aspect. Closed source software can still be audited (with difficulty). The problem is that the source material for the hashes can't be audited, even when we know exactly how the system works.

rangerdan · on Aug 25, 2021

The audience here should know that where there is capability, there is abuse. Android and Windows are no exception.

wyager · on Aug 25, 2021

This is already a warrantless search that’s effectively controlled by the government. Obviously there’s enough chaff in the air to prevent that from being legally useful in any way.

nullc · on Aug 25, 2021

So far the courts have determined that since providers invade their customer privacy of their own free will with no incentive or coersion by any government agency, that it is not a search by the goverment.

It's just your friendly trillion dollar tech company putting on a mask and cape and engaging in a bit of vigilante fun. You know? Like batman! ( https://www.youtube.com/watch?v=Kr7AONv3FSg )

zepto · on Aug 25, 2021

Nope. It’s not controlled by the government, and it’s not warrantless, since it is opt-in.

Enginerrrd · on Aug 25, 2021

Since the govt. supplies the hashes in an unauditable way, it absolutely is controlled by the government. What's to stop them from using hashes of non CSAM material?

zepto · on Aug 25, 2021

The government doesn’t supply the hashes in an unauditable way, that is a totally false statement.

The hashes are supplied by NCMEC, a non-profit which is auditable, not a secret government agency.

In any case, even if a non-CSAM hash were somehow in the database, Apple reviews the images before making reports, and those reports are used in normal criminal prosecutions.

nullc · on Aug 25, 2021

Courts have determined that for this purpose the NCMEC is an agent of the government. NEMEC is 99% funded by the government and its ability to handle child porn is directly deprived from a explicit legislative carveout for them by name. What they do would be a felony for you or I to do. The fact that they are technically non-profit rather than an agency makes them significantly less accountable to the public. We cannot FOIA their communications, their composition isn't subject to public review, we cannot vote them out. And we have no way to tell what their database contains, nor is there any avenue for redress should we somehow learn of an inappropriate listing.

To the extent that you can say that they're not exactly a government agency, they absolutely have been deputized by the government.

zepto · on Aug 25, 2021

> And we have no way to tell what their database contains, nor is there any avenue for redress should we somehow learn of an inappropriate listing.

Yes there is. It’s called legal liability. They are not immune to being held accountable for their actions just like any other non-profit. They may be immune from prosecution for possessing CSAM, but they don’t have any kind of immunity for damages they cause through their own actions.

nullc · on Aug 25, 2021

So you would agree that Apple's use of private set intersection serves the purpose of shielding themselves and their data sources for the purposes of mitigating their legal liability from the harms created by false listings by concealing the content of the databases?

wyager · on Aug 25, 2021

The database of hashes is controlled by the government. They can put whatever they want in there.

zepto · on Aug 25, 2021

False. See my other reply.

https://news.ycombinator.com/item?id=28303966

mafuy · on Aug 25, 2021

I don't think it's entirely false. NCMEC is not a regular non-profit. They have special clearance to do things that regular citizens and non-profits are not allowed to do.

wyager · on Aug 25, 2021

A government-funded “NGO” with special legal powers is the government.

koolhaas · on Aug 25, 2021

Presumably, it’s done this way so they can say computers other than your personal device do not scan photos and “look” at decrypted and potentially innocent photos. And technically the original image is never decrypted in iCloud by Apple - if 30 images are flagged they are then able to decrypt the CSAM scan meta data which contains resized thumbnails, for confirmation.

In summary, I’m guessing they tried to invent a way where their server software never has to decrypt and analyze original photos, so they stay encrypted at rest.

roody15 · on Aug 25, 2021

Apple frequently decrypts icloud data including photos based on a valid warrant. This new local scanning method does not stop apple from complying and decrypting images like they have for years.

https://www.apple.com/legal/privacy/law-enforcement-guidelin...

(Note: I have worked with law enforcement in the past specifically on a case involving Apple and two iCloud accounts. You submit a PDF of the valid warrant to Apple. Apple sends two emails one with the iCloud data encrypted. A second email with the decryption key.)

koolhaas · on Aug 25, 2021

Of course, but it's a kind of last resort thing to support a valid legal process they cannot (and probably don't want to) skirt around. They also publish data on warrant requests.

To me it's pretty clear they are doing the absolute minimum possible to keep congress from regulating them into a corner, where they lose decision making control around their own privacy standards. The system they came up with is their answer for doing it in the most privacy conscious way (e.g. not decrypting user data in icloud) while balancing a lot of other threat model details, like what if CSAM-hash-providing organizations provide img hashes for a burning American flag, and lots of other scenarios outlined in the white paper.

grlass · on Aug 25, 2021

calling resized thumbnails metadata is a bit of a stretch imo.

Surely that's just the data, but resized?

koolhaas · on Aug 25, 2021

Yes I agree, bit of a stretch. Based on their whitepaper, it's a smaller version of the original image, I guess just large enough to support the human verification step.

But I'm unsure that the thumbnail is included with every CSAM "voucher" -- it's likely only included when you pass the 30 image limit. Need to read that section more clearly.

saithound · on Aug 25, 2021

A thumbnail is included with every safety voucher. However, it is encrypted with a key that resides on your hardware and is unknown to Apple. So Apple doesn't have enough information to decrypt your thumbnails at will.

A secret sharing scheme is used to drip-feed Apple the key: each time a positive match occurs, Apple learns a bit more about your key. Once the threshold is reached, Apple will have learned enough to recover your encryption key, and will be able to use it to decrypt all your matching thumbnails at once.

koolhaas · on Aug 25, 2021

Fascinating, thanks for clarifying.

FabHK · on Aug 25, 2021

> Based on their whitepaper, it's a smaller version of the original image,

I seem to recall that the white paper speaks of a "visual derivative" without specifying it further.

koolhaas · on Aug 25, 2021

The Technical Summary uses "visual derivative" without clarification, but their Threat Model PDF clarifies it further as thumbnails:

>The decrypted vouchers allow Apple servers to access a visual derivative – such as a low-resolution version – of each matching image.

https://www.apple.com/child-safety/pdf/Security_Threat_Model...

berkona · on Aug 25, 2021

Resized thumbnails aren't a stretch... they're a scale. Bum dum tiss.... I'll let myself out

zabatuvajdka · on Aug 25, 2021

Interesting technical problem/solution. Another benefit is saving on millions of server computations when modern iOS devices have neural chips etc.

I suppose folks who don’t like privacy implications can downgrade to an iPhone 4 and maybe it will not support the feature.

Grustaf · on Aug 25, 2021

Or turn off iCloud syncing of photos.

Grustaf · on Aug 25, 2021

Most people feel that things that happen on your device are safer than things in the cloud, you have probably noticed how Apple constantly stress that this or that happens "on device".

And for the suspicious, it's of course much easier to notice if Apple would change their algorithms if they happen on device.

YetAnotherNick · on Aug 25, 2021

Also they will be doing scanning in their server anyways as there are other ways to upload it in iCloud than using latest iOS.

user-the-name · on Aug 25, 2021

Apple basically never announces things before they are ready to be released, so them not announcing this means very little. They may be working on it, and their usual secrecy is biting them in the ass very hard.

mrweasel · on Aug 25, 2021

One reason for client side could be to save on datacenter compute resources. That would seem like a perfectly valid reason, if that’s their reasoning.

AnonC · on Aug 25, 2021

If it’s going to really save a significant amount of data center resources, then it’s also probably going to reduce the battery lifespan of all these devices significantly. That may probably be good for Apple’s bottom line temporarily, but it will hurt in the long run. I’d imagine it’d be a lot easier to optimize the data center compute resources than optimizing the scanning on individual devices and not trashing battery lifespan.

josefx · on Aug 25, 2021

> I’d imagine it’d be a lot easier to optimize the data center compute resources than optimizing the scanning on individual devices and not trashing battery lifespan.

No amount of data center optimization will beat running computations on hundreds of millions of devices other people have to pay for.

jtbayly · on Aug 25, 2021

If that’s a valid reason to steal electricity and compute resources from your customers, then why not go the whole way and use all the Mac’s as storage and compute for iCloud?

websites2023 · on Aug 25, 2021

> If this is the case there is zero reason to scan locally and you can just scan the uploaded image once it is on the server.

You’re having a house party. Because of the pandemic, you’d rather people who have COVID not attend. You can’t trust everyone to get vaccinated or get tested beforehand. So, you decided to set up a rapid-test system, just to be sure.

Would you rather test in your kitchen or your driveway?

addingnumbers · on Aug 25, 2021

Why the contagion analogy?

If contagion wasn't a factor, I'd rather test in the kitchen, it's cozier.

Are you suggesting CSAM will infect more unwilling victims if it gets into a private iCloud account?

websites2023 · on Aug 25, 2021

CSAM on the server means the server is tainted and likely to be searched by governments. Good plan to keep it off eliminates that excuse.

addingnumbers · on Aug 25, 2021

I don't see how this scanning reduces the likelihood of a government searching their servers. Seems to me like this can only result in more court orders than they had before scanning.

websites2023 · on Aug 26, 2021

If you’re Apple, and you’re throwing your weight around in the US bread and butter market to convince the FBI to not scan your servers for CSAM, which is more compelling: we check for it when it gets here, or we keep it off our servers using crypto hash magic?

toxik · on Aug 25, 2021

Sigh, for the last time, it doesn't actually matter if the NeuralHash is identical. You need multiple images matching, and then the images are compared by another system on Apple's end, which you don't know anything about.

The system is specifically designed so that colliding images does not pose a threat to the user.

NeuralHash and the CSAM scanning is grotesque, but please, criticize it for what it is, not some bullshit that is easily dismissed as technical ignorance.

tsimionescu · on Aug 25, 2021

Then let's get rid of the NeuralHash entirely, if it doesn't matter, right?

If it's a critical part of the system, then it should be inspected thoroughly. If Apple claims a minuscule chance of a hash collision, and the reality is that collisions are relatively common, that significantly changes the requirements for the backend system, which Apple keeps secret. We have every right to believe, bbased oon ppublic info, that Apple was expecting that NeuralHash would be almost fool-proof, leaving the backend system to be a rubber stamp. This would be tragic.

toxik · on Aug 25, 2021

The point in the NeuralHash and PSI system is to preserve user privacy as far as possible. From a technical standpoint, it is not essential - a NeuralHash function that returns 0x0000… for everything would still catch CSAM. It's just that it would upload every single image on the user's device.

Now, how well this NeuralHash does preserve privacy is a different question, and /not/ one that is being answered by the original post here. In fact, I've not seen anybody look at the hash distribution over natural images, which would be an actual argument against the system.

Retr0id · on Aug 25, 2021

It doesn't really matter whether all images are uploaded, or just 1 in x (for large value of x), due to the Panopticon effect.

ithkuil · on Aug 25, 2021

Let's not forget what the alternative is: this is about images that are uploaded on icloud anyway. The alternative is to upload the image in clear (or with ane encryption key that apple controls), and let apple run the CSAM filter on their servers.

Apple now has the ability to encrypt the images before sending them to icloud, with a private key you own. Except that some percentage of images that match the CSAM fingerprint with their neural feature extractor will be sent to a CSAM filter on the server side (whose workings we don't have many details about)

This whole thing backfired on Apple entirely due to psychological effects, not because they are really doing anything more "panopticon" that they would already able to do now on their icloud storage (after all people are ready sending their photos to apple)

ben-schaaf · on Aug 25, 2021

They already have the decryption keys for iCloud. Undoubtedly they've already been running a similar CSAM filter server-side for ages. The only thing doing this stuff client side has done is reduce privacy and erode trust.

madeofpalk · on Aug 25, 2021

> Undoubtedly they've already been running a similar CSAM filter server-side for ages

Apparently they have not - They say facebook reported something like 20m images in a year, and Apple reported 250.

saynay · on Aug 25, 2021

I don't see how the privacy is any more eroded than it already was?

Eroded trust, sure, but that is mostly because they did a terrible job communicating it.

PolCPP · on Aug 25, 2021

Theres not even a need to do that, upload the encrypted image and its neuralhash.

ithkuil · on Aug 25, 2021

What should they do if the neuralhash matches CSAM? Should they trust that the nerualhash actually matches real CSAM and you will should be reported to the police? That's clearly wrong, since there are going to be false positives, by design.

The whole point for this is to be a probabilistic filter, so that they need to run the real CSAM scanner on a subset of files.

You can fall into two camps:

a) apple should never ever scan my private images I upload on their cloud. b) apple can scan the images once they reach their servers.

If you pick (a), then clearly neuralhash shouldn't exist and you can argue against that on the ground that you want utter privacy. But you have to be consistent:every other cloud service that does scan the images server side should receive the same critique.

If you pick (b), then you must recognize that this additional machinery doesn't increase their reach to your private data, but quite the opposite, it allows them to implement e2e encryption for 99.9% of your content. You may argue that it's unnecessary and confusing and spooky and be afraid of the slippery slope precendent for other uses.

simion314 · on Aug 25, 2021

But if Apple really cares about children why they did not done this scans in iCloud like all the others? Did not care as much as Google or Facebook? Seems to me like Apple does not care at all and seems more like a dev with big ego wanted to add neural hashes to his CV but if you can explain how Apple cared for children all this years but only now are doing something I really want to see the explanation

theshrike79 · on Aug 25, 2021

Because Apple is the one company that actively tries not to know anything about you.

FB and Google will exhaustively analyse every single facet of your online presence and use your pictures to train their ML models for face detection and object detection.

Apple, on the other hand, even explicitly splits Map directions to segments so that they can't know where you left from and where you are going to.

simion314 · on Aug 25, 2021

Apple was sending unencrypted on the network what application you started, if they would "actively tries not to know anything about you" they could have implemented this better.

Anyway how is your assumption make sense , Apple cares about children and about your privacy so scanning your images in iCloud was wrong until 2021 when something changed, what changed? does Apple cares more about children starting from now or they care less about privacy? or are they forced to do it?

read_if_gay_ · on Aug 25, 2021

The “actual argument” against the system that this provides is that Apple lied about the likelihood of hash collisions.

Therefore, why trust any of their other claims?

simondotau · on Aug 25, 2021

Apple's claims are based on the statistical likelihood that there would be 30 collisions with CSAM hashes within one user account.

Just because someone has found an image of a nearly featureless diagonal thing which collides with another image of a nearly featureless diagonal thing, that doesn't disprove Apple's claims.

BeefWellington · on Aug 25, 2021

> Apple's claims are based on the statistical likelihood that there would be 30 collisions with CSAM hashes within one user account.

Given people can now generate images that collide, it seems like the statistical likelihood has drastically changed since it was originally announced.

toxik · on Aug 25, 2021

I find this an unconvincing argument as well, you're saying that because Apple made a false claim, any claim may be valid. This is obviously not the case, what they did was to, albeit likely knowingly, calculate the hash collision probability /if each bit is a coin flip/, which comes out to pow(2, -k) for k bits. It's tiny. Of course, each bit is /not/ an independent coin flip under the NeuralHash function.

So again the actual argument becomes: what is that distribution like?

FabHK · on Aug 25, 2021

> what they did was to, albeit likely knowingly, calculate the hash collision probability /if each bit is a coin flip/, which comes out to pow(2, -k) for k bits. It's tiny.

I doubt that's what they did. I think they ran tests on huge numbers of pictures, got an estimate, put in a safety factor, and determined the threshold to hit their target (and put in another safety buffer then).

Naturally occurring collisions are not going to be an issue, and adversarial ones neither, I predict. Just as with current cloud providers.

zepto · on Aug 25, 2021

> that because Apple made a false claim

Apple never made a false claim. They have never anywhere stated that neuralhash makes false positives at 1 in a trillion. Only that that is the rate for the system as a whole to flag accounts for review. The explicitly mention that they will vary the number of matches needed to maintain this if it turns out to be higher or lower based on images in the wild.

There are good arguments against this system but most of the technical debate seems to have devolved into amplifying lies now.

read_if_gay_ · on Aug 25, 2021

Fine, let’s say they didn’t lie, they made misleading claims.

Still. Why trust them after that?

If a company can make my own smartphone report me to the police, and they want my business, they better prove I can trust them. Apple has plainly done the opposite.

The whole ordeal is just utterly 1984.

tsimionescu · on Aug 28, 2021

Sure, but a NeuralHash system with a collision chance of 100% would obviously not be abused to say NeuralHash collision implies CSAM - the secondary validation system will definitely be the last bastion.

In contrast, a NH system believed to have a collision chance of 1 in a trillion trillion may well be considered infallible, and any detection be directly reported as CSAM, with the 'backend verification' amounting to nothing more than a rubber stamp.

Of course, if you implicitly trust Apple not to do the second, than you're right, the NH collision rate doesn't matter too much.

rgovostes · on Aug 25, 2021

The conclusion section of the article associated with the GitHub repo linked here is that collisions are not common and Apple’s published collision probability matches their findings. Furthermore the thresholding scheme requires 30+ independent collisions which is astronomically improbable.

david_draco · on Aug 25, 2021

> collisions are relatively common

Are they?

collaborative · on Aug 25, 2021

We still need to rely on the automated "secret backend system" that nobody supposedly knows anything about

toxik · on Aug 25, 2021

You (or at least Apple's customers) trust in and rely on Apple's proprietary software to do its job all the time. How is this different? I find this argument very weak.

simion314 · on Aug 25, 2021

There is a big difference, if say Apple tracking of what you run is problematic you get a bad user experience like apps starting after 1 minute of waiting, or if Apple App Store contains malware you will probably get some annoying issue while Apple will try to silently cleanup their mess BUT with this system there is a difference, this is designed not to serve you or protect you, it is designed to do some checks then if some specific rules match send some guys on you to destroy your life, today is FBI tomorrow other authoritarians.

Issues in any other Apple software will not send the police on you. Why would you install a software on your desktop/laptop that is designed to snitch on you, you would need to get some advantage or be forced by some law.

For now I see only disadvantages but please let me know of any real advantage and not speculation

Disadvantages:

- closed software with hidden db can't be trusted, so as a user you will always have a doubt that some non CP images are in the db(Apple always collaborates with governments)

- bugs in this stuff will cause you big problems(we seen in the past how false accusation destroyed peoples life) and we also seen bad actors abusing this kind of stuff.

- this is also clearly a beginning, now that Apple has the capability then even if they were saints a judge could force them to add new hashes, change the configs etc.

Hackbraten · on Aug 25, 2021

1. You can at least somewhat audit the software running on an iPhone, for example by means of reverse engineering. You can’t audit the server side.

2. It’s one thing to rely on proprietary services like Find My or Siri. It’s another thing to rely on a secret server-side app that has the power to destroy your life.

fetzu · on Aug 25, 2021

What I somehow fail to grasp in the first argument is that this whole system is designed specifically so that it runs client-side. AFAIK all the alternatives (as in « cloud photo services ») have been doing the exact same thing on the server side for decades. If you upload your photos to the cloud, a lot of service actually already have the power to destroy your life.

collaborative · on Aug 25, 2021

This is all about Apple users waking up to the fact that they've been had

idunnoman · on Aug 25, 2021

"Hey Siri, accuse me of something that will ruin my life and reputation even if I'm not guilty".

read_if_gay_ · on Aug 25, 2021

No. I trusted Apple’s proprietary software before this, because they maintained that they care about privacy, and there have been examples of that.

Now, I don’t trust them anymore.

collaborative · on Aug 25, 2021

Tell this to the victims of Pegasus. If anyone were able to get their hands on the "secret backend system" we wouldn't be talking about spy games, we'd be talking about people's lives being ruined

roody15 · on Aug 25, 2021

Apple still has not patched the security exploit in iMessage used by Pegasus. Apple has released two ios security updates since the Pegasus revelations but still has not patched it most widely used exploit… hmmmm.

Now apple is getting a local client side scanning tool ready. Interesting timing.

collaborative · on Aug 25, 2021

I don't think what you are saying is far-fetched

strangetortoise · on Aug 25, 2021

afaik none of Apples other "proprietary software" is designed to pass my personal images to a human for visual inspection if it mistakenly outputs 2 high-enough numbers after a handful of convolution operations and matrix multiplications.

FabHK · on Aug 25, 2021

They pass a "visual derivative" to "a human", but only after some matrix multiplications etc. that result in extremely low probability false positives.

It could also happen that you lose your phone and "a human" finds it and randomly puts in the correct passcode on the first try and visually inspects your personal images. In fact, that seems vastly more likely [1].

[1] About 4% of smartphones are lost or stolen every year [https://www.mcafee.com/blogs/consumer/family-safety/almost-5... ], but make it just 1/1000, so 1e-3. Then a 6 digit passcode, 1e-6, so we're at 1e-9 per year, or 1000x as likely as being falsely flagged, assuming Apple's numbers (which can easily be achieved by calibrating the threshold).

saithound · on Aug 25, 2021

Discussing the preimage attack on NeuralHash is not technical ignorance. Dismissing the preimage attack as irrelevant is.

0. Most importantly: the existence of a preimage attack makes Apple's system completely useless for its original purpose. The NeuralHash collider allows the producers and distributors of CSAM material to ensure that nearly all of the next generation of CSAM will suffer from hash collisions with perfectly innocent images. Two weeks after it was deployed, Apple's CSAM scanning is now _only_ an attack vector and a privacy risk. Thanks to the preimage attack, it's now completely useless for its nominal function! Apple put a lot of effort into a system that reduced the privacy and security of all their customers, and made the company itself more exposed to the whims of governments. And for no gain whatsoever.

1. There are no known perceptual hash functions on which preimage attacks are difficult. Barring a major "secret cryptographic breakthrough", Apple's second hash function is not resistant to preimage attacks either. In fact, the second algorithm is almost certainly easier to attack than NeuralHash itself, since it has to work on the "visual derivative", a fixed-size low-resolution thumbnail of the original image.

2. But isn't Apple's second algorithm kept secret, making it difficult to perform preimage attacks against it? No.

First of all, the second algorithm cannot be kept secret. Apple doesn't have its own CSAM database (the whole point is that they don't want to deal with CSAM on their servers!), so the algorithm has to be shared with multiple organizations which do have such databases, so that they can pre-compute the hashes that Apple will match against. Due to Apple's policy, some of these organizations will be located outside the US [1]. Chances are, the hash function will leak: Apple won't know if and when that happens.

Secondly, this _is_ security by obscurity. Some people argue that keeping the hash algorithm secret is similar to keeping a cryptographic key secret. This is not the case. Of course, any security system relies on keeping _something_ secret, but these secret somethings are not created equal. The secret keys of cryptographic algorithms are designed to satisfy Kerckhoffs's assumption. This means that the key, as long as it remains secret, should be sufficient to protect the confidentiality and integrity of your system, even if your adversary knows everything else apart from the key, including the details of the algorithm you use, the hardware you have, and even all your previous plaintexts and ciphertexts (inputs and outputs).

The second hash does not have this property at all. Keeping the algorithm secret does not ensure the confidentiality or integrity of Apple's system. E.g. if somebody gets access to a reasonable number of inputs-output examples, that allows them to train their own model which behaves similarly enough to let them find perceptual hash collisions, even if they don't know the exact details of the original algorithm. This is incredibly hard for cryptographic hashes, but very easy for perceptual hashes, since a small change in the input should cause only a small change in the output of the perceptual hash algorithm. So, to maintain security, Apple doesn't have to keep just the hash algorithm (or its configuration parameters) secret, but all the inputs and outputs as well. This is bad: the fewer and simpler the secrets that one must keep to ensure system security, the easier it is to maintain system security.

Finally, the second hash algorithm is unlikely to be original (NeuralHash was original, and by all accounts it was a massive effort). If an attacker successfully guesses that Apple's secret algorithm H is closely related to a known algorithm, say PhotoDNA, they will probably be able to make a transfer attack against it. By engineering a PhotoDNA collision on the resized thumbnail (e.g. via a resizing attack, extensively discussed in a previous thread [3]), they have a reasonable chance of generating a H-collision as well. How good is fairly good? Well, something like 5% is more than enough! The attacker needs to produce a certain number of NeuralHash collisions (say 30 images) to get through the first threshold of Apple's algorithm. But after that, Apple will decode all the thumbnails in the user's safety voucher: the attacker only needs one of those 30 to get through the second hash. Given a sufficiently high probability of hash collisions, this can be achieved "blindly".

3. It's incredibly easy to come up with these kinds of attacks. Even the HN audience could come up with several reasonable plans, and could point out several reasonable issues, in two weeks. People who do malice for a living will have a much easier time with it. Even if somehow all the plans presented on HN turned out to be unviable, it will not take long for someone to stumble upon something practical. Any reassurance that Apple could provide at this point is fake. Cf. the timelines for real security: it took 17 years to come up with an analogous attack against SHA-1 [4], and two years after that to turn it into something that can be exploited in practice [5]. The existence of a preimage attack made Apple's system completely useless for its original purpose in two weeks. It's now just a security and privacy hole, with no other function. Keeping it around would be a travesty, even if it was difficult to exploit. But it's not.

[1] https://www.itnews.com.au/news/apple-to-only-seek-abuse-imag...

[2] https://en.wikipedia.org/wiki/Kerckhoffs%27s_principle

[3] https://news.ycombinator.com/item?id=28236102

[4] https://security.googleblog.com/2017/02/announcing-first-sha...

[5] https://www.zdnet.com/article/sha-1-collision-attacks-are-no...

simondotau · on Aug 25, 2021

That's a lot of words to say that you could sufficiently mangle an image that it could pass through all of Apple's algorithmic hurdles while not actually being CSAM. Of that I have no doubt. You could definitely generate a mangled image that fools multiple perceptual hash algorithms.

Let's set aside the questions of where you got all these hashes to generate collisions with, how you got 30 of these mangled images into your victim's camera roll without them noticing. And let's also set aside whether your victim's device is an iPhone with iCloud Photo Library enabled (and has sufficient storage). I still don't get what these mangled images have achieved, other than giving the manual review team something other than child porn to look at.

Seems to me like it'd be easier to just find actual child porn, print it out, place it somewhere in the victim's house and then report it to the police.

dTal · on Aug 25, 2021

I think you missed the point of the first paragraph. The point is that you can now hide child porn by making its hash collide with innocent images. They won't ever make it to manual review. Ergo, NeuralHash is now useless.

simondotau · on Aug 25, 2021

Why would anyone go to the effort of technical concealment[1] of CSAM when they could just resist the urge to import child porn in their phone's photo library in the first place? I've managed to resist the urge to import regular porn into my photo library, and being caught with regular porn is (at most) embarrassing. It's not potentially life-destroying.

It's inconceivable that anyone could desire possession of NCMEC-catalogued CSAM images without being aware that they're risking serious consequences if they're caught. Who wants their deepest, darkest, potentially life-ruining secrets just milling about with photos of the dog and last night's dinner?

[1] ...which is all but impossible for an average user to prove was effective; it's not like the Photos app has a "Not Child Porn!" checkmark.

laszlokorte · on Aug 25, 2021

I agree with you in that I do not understand why anybody doing something illegal would upload related data to a cloud storage.

But if nobody would import CSAM into their icloud library why do all the pictures need to be scanned in the first place? I would imagine anybody doing major illegal stuff being informed about important measures in order to not be caught.

simondotau · on Aug 25, 2021

I agree, why Apple is doing this is an interesting and pertinent question. I don't think it's actually because they think this will put a big dent into CSAM. So the question is: what is motivating Apple?

Perhaps it's a prerequisite for deploying end-to-end encryption of iCloud Photo Library and/or iCloud Backups. The latter in particular has remained decryptable by Apple supposedly due to pressure from the FBI. Perhaps CSAM is what the FBI are using to justify their pressure.

Perhaps it's because Apple's team of lobbyists are seeing ahead to future anti-privacy, anti-encryption legislation being justified under the guise of CSAM. If Apple can show that the CSAM problem is already "solved" then such justifications disintegrate.

FabHK · on Aug 25, 2021

So, the presumed attack (not against individuals, but to defeat the system) is

1. Identify some innocuous pictures that many many people have (memes, Beyoncé, whatever).

2. Produce CSAM.

3. Mangle it such that it is still CSAM visually, but NeuralHash-collides with the innocuous pictures from step 1.

4. Distribute.

5. Wait until they are (via some other mechanism) a) identified as CSAM, b) added to the NCMEC database, c) added to the Apple on-device database of blinded hashes in some iOS update.

6. Millions of people are suddenly incorrectly flagged for exceeding the threshold by NeuralHash (since they have the innocuous pictures in their library), and the review teams are flooded and can't pick out the small number of actual CSAM holders.

That is not without a certain elegance. However, it seems to me that

A) it is predicated on the assumption that you can easily mangle pictures to NeuralHash-collide with a desired target picture (out of a set of widely circulating innocuous pictures) without deteriorating the visual content too much.

B) it would be quickly defeated by amending the 2nd tier algorithm (between NeuralHash and human review), though, as you highlight, that might be tricky given that the team working on this presumably only has access to the innocuous false positive collision image, not the (purposefully mangled) CSAM.

saithound · on Aug 25, 2021

> A) it is predicated on the assumption that you can easily mangle pictures to NeuralHash-collide with a desired target picture (out of a set of widely circulating innocuous pictures) without deteriorating the visual content too much.

Note that this requires no single "desired" target picture. There are millions of popular, innocuous pictures. As long as you can make your CSAM match any one of them without significant mangling, you're good to go. Not having to choose one specific target makes this much easier to accomplish.

nullc · on Aug 25, 2021

> it is predicated on the assumption that you can easily mangle pictures to NeuralHash-collide with a desired target picture (out of a set of widely circulating innocuous pictures) without deteriorating the visual content too much.

You can. Here is an example I created (with links to more): https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issue...

I'm so tired of people suggesting that you can't. Please explain to me why you posted suggesting otherwise.

I've contemplated making some that are also photodna matches, I expect that it's possible. But access to photodna is only through some awful windows tools, and AFAICT people would just keep posting denials even after an example was posted-- so it's not worth the effort at least not worth it just to further the public discussion.

FabHK · on Aug 25, 2021

> Please explain to me why you posted suggesting otherwise.

a) I didn't suggest otherwise, I said that it is predicated on that assumption, about which I was undecided, largely because b) I didn't know better.

I read that thread 7 days ago, when the collisions were a gray blob or a clearly modified dog (to Lena) or clearly modified Lena (to dog). I hadn't re-read the thread in the last 4 to 5 days, when you demonstrated the natural looking collisions (second-preimage images).

Very impressive work I wasn't aware of.

nullc · on Aug 25, 2021

Great answer. Thank you!

SiempreViernes · on Aug 25, 2021

There's also the _very_ important assumption that Apple doesn't check for collisions with popular images before adding new things to the database.

yeldarb · on Aug 25, 2021

> allows the producers and distributors of CSAM material to ensure that nearly all of the next generation of CSAM will suffer from hash collisions with perfectly innocent images

That’s a really interesting attack vector I hadn’t seen mentioned previously.

Most people are talking about the potential for adversarial images to be sent to users. If they were instead injected into the database itself (either by poisoning real CSAM or social engineering) that would have far wider ramifications.

I wonder what the most widely-saved pornographic images are across iCloud users.

If actual CSAM were perturbed to match the hash of, say, images from the celebrity nude leak a few years back and added to the database then thousands of users could be sent to “human review”. Since the images are actually explicit how would the human reviewers know not to flag them to authorities?

theshrike79 · on Aug 25, 2021

Because they're not CSAM?

People don't seem to grasp what kind of images end up in the CSAM databases. They are most definitely not "leaked celebrity nude selfie" level stuff.

Think of the most vile sexual thing you could do to a child and then times that by two and halve the child's age in your mind. That's the shit that gets in there.

It's not something even 4chan weebaboos share. It's stuff that makes Liveleak regulars go "ewwwww, gross".

saithound · on Aug 25, 2021

The issue is not that a celebrity photo might get into a CSAM database. It's that a true CSAM photo, which has been modified to have the same hash as a "leaked celebrity nude selfie", will probably get into a CSAM database.

theshrike79 · on Aug 25, 2021

The CSAM database is not generated by an AI. Actual people hand-pick the worst images to include in the list.

yeldarb · on Aug 25, 2021

How is the human reviewer expected to make that judgement call? They’re not allowed to see the original to compare with for obvious reasons.

theshrike79 · on Aug 25, 2021

They see a "visual derivative" of the original.

I think they can tell the difference between a naked celebrity and a molested child.

nullc · on Aug 25, 2021

(A threshold of) matchings result in the private keys for the images being leaked to Apple, where they're vulnerable to:

(1) Review by apple staff (2) Access and leaking by other apple staff (3) Access by hackers who have compromised their system (4) Access by parties coercing apple/staff, including via national security letters.

All of which compromise the privacy of the user. This matters or the neuralhash comparison wouldn't exist in the first place.

Totally agree that the whole system is grotesque-- but that doesn't stop it also being grotesque in every detail as well. The fact that there are false positives when they easily could have designed a system that had none (at the expense of increased false negatives) shows that Apple doesn't especially value customer privacy even if you accept their vigilante privacy invasion. The fact that it's possible to construct adversarial false positives and that their reports didn't disclose this fact shows they either don't know what they're doing or they're not being honest about the risks (or both).

scotty79 · on Aug 25, 2021

Why are exact collisions interesting? They are not intended to be compared exactly.

This algorithm doesn't even give exact matches for the same image on different hardware.

https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX

Note: Neural hash generated here might be a few bits off from one generated on an iOS device. This is expected since different iOS devices generate slightly different hashes anyway. The reason is that neural networks are based on floating-point calculations. The accuracy is highly dependent on the hardware. For smaller networks it won't make any difference. But NeuralHash has 200+ layers, resulting in significant cumulative errors.

tgv · on Aug 25, 2021

The hash is 96 bits long. When hashing 1 billion pictures, that gives a collision probability of 6e-12. If it were uniformly distributed. There's no way people have hashed billions of images already. It just shows that it's pretty probably there will be collisions, and on visual inspection, it looks as if the collisions will happen on visually similar images. So if there's a naked baby pic in the CSAM database, quite a few of you 100s of child pictures can be flagged.

halflings · on Aug 25, 2021

Clearly this is not a cryptographic hash, and hence it's known hashes are not uniformly distributed.

Apple explained in their technical summary [0] that they'll only consider this an offence if a certain number of hashes match. They estimated the likelihood of false positives there (they don't explain which dataset was used, but it was non-CSAM naturally) is 1 out of a trillion [1]

In the very unlikely event where that 1 in a trillion occurrence happens, they have manual operators to check each of these photos. They also have a private model (unavailable to the public) to double-check these perceptual hashes which also used before alerting authorities.

[0] https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni... [1] https://www.zdnet.com/article/apple-to-tune-csam-system-to-k...

tgv · on Aug 25, 2021

> they have manual operators to check each of these photos

For now. But what will happen when there are thousands of false positives per day? Will they increase the staff? Or will they add another algorithmic layer? Or just up the threshold a bit? There's no guarantee. The only thing that's certain is that the NeuralHash doesn't inspire confidence.

snowwrestler · on Aug 25, 2021

Regardless of what Apple does, law enforcement must always manually review suspected CSAM before requesting a warrant based on it. So the idea that you could SWAT someone with some hash collisions on innocent images is just not possible.

At most you could maybe temporarily lock someone’s iCloud account. But again, the collisions would need to be multiple and all look like CSAM at reduced resolution.

In general, it seems not correct to think about NeuralHash like SHA or RSA. It’s not a cryptographic system and collisions are not a one-step endgame.

bigwavedave · on Aug 25, 2021

> Regardless of what Apple does, law enforcement must always manually review suspected CSAM before requesting a warrant based on it. So the idea that you could SWAT someone with some hash collisions on innocent images is just not possible.

A quick search found four clear cases where law enforcement has favored technological false positives over evidence:

Ousmane Bah: https://www.businessinsider.com/teen-sues-apple-1-billion-fa...

Robert Williams: https://www.cbsnews.com/news/facial-recognition-60-minutes-2...

Nijeer Parks: https://www.cnn.com/2021/04/29/tech/nijeer-parks-facial-reco...

Michael Oliver: https://www.dailydot.com/debug/detroit-facial-recognition-wr...

The one on Ousmane Bah really frustrates me- not only was he on a date at prom during the theft, he was in another state! "Nothing to hide" does not mean "nothing to fear" and allegations (even false ones) of possessing CSAM will ruin lives.

At the end of the day, what it really comes down to is trust; personally, I do not have enough faith in due process to not ruin innocent people. But I'm just some guy online, I will readily admit I don't know anything about anything.

GistNoesis · on Aug 25, 2021

Afiu, Apple's NeuralHash uses exact collisions when they do their Private Set Intersection.

The main advantage of using exact collision is that you can then blind the perceptual hash with a cryptographic hash and avoid any leak of information. (Taking for example sha256 of this perceptual hash won't allow any attacker to get any information on the features from the hash, but if the perceptual hash are the same then the input of the sha256 is the same and therefore the output of the sha256 is the same).

This is important because it alleviates the risks of an eventual leaking the database as Apple never touched and compared sensitive content but only cryptographic hashes of the perceptual hashes.

Some other system like PhotoDNA, rely on a euclidean distance between features being less than a threshold to register a match, which allows to quantify how far the image is from CSAM, but mean that the hash leak some information about the original content.

scotty79 · on Aug 25, 2021

But they can't use exact matches because their algorithm doesn't create same hash for same image on their various platforms due to hardware differences in floating point arithmetic on their different devices. Unless they have on their servers floating point arithmetic emulators that can calculate exact hash for each of their different devices for each offending image then they can't only match exactly.

specialist · on Aug 25, 2021

Per ATP: Apple will compare hashes of local photos with a national registry of child pornography photos. Once a certain (unknown) threshold is reached, let's say 20 hits, some kind of escalation occurs, with some kind of manual (human) review steps.

Accidental Tech Podcast - A Storm of Asterisks https://atp.fm/443

I haven't listened to the follow up episode yet.

I still have zero opinion on this photo scanning kerfuffle. I just don't know enough. Of all the "hot takes" on this issue, ATP's has been the most comprehensive. So appreciated.

pbronez · on Aug 25, 2021

Thanks for the link. Daring Fireball’s analysis is the most helpful take I’ve seen on the topic so far:

https://daringfireball.net/2021/08/apple_child_safety_initia...

hackinthebochs · on Aug 25, 2021

I'm interested in the details of the manual review. Does Apple have access to the database of original images and will they use it compare? If not, I can imagine a scenario where a photo of a naked child is flagged as matching the database, the human reviewer sees that it is in fact a naked child and assumes this must mean the image has been legally determined to be child pornography. The case gets referred to authorities and the innocent victim's life is upended for potentially years until the case plays out.

snowwrestler · on Aug 25, 2021

Law enforcement must manually review all suspected CSAM before seeking a warrant based on it. There is a whole process there, beyond whatever Apple has implemented, before a prosecution begins.

Understand that matching a file in the NCMEC database is not itself a crime. The whole CSAM-detecting ecosystem is just a tool for surfacing potential crimes. Having a few pics of your own child naked is not illegal and it’s pretty easy for law enforcement to figure out if that’s the case.

hackinthebochs · on Aug 25, 2021

>Law enforcement must manually review all suspected CSAM before seeking a warrant based on it.

Is this a legal statute or simply convention due to the ways things have historically worked (i.e. pre-hash matching at scale)? If warrants are granted based on probable cause, it seems easy to convince a judge that a hash match is sufficiently unlikely that it would exceed the threshold for probable cause. In the context of cryptographic hashes, this is accurate. But if law enforcement doesn't distinguish between cryptographic and perceptual hashes, then there is the real possibility for cases opened and warrants issued unjustifiably.

Sure, matching a hash isn't a crime and you will eventually be exonerated. But as they say, you can beat the rap but you can't beat the ride.

mulmen · on Aug 26, 2021

> Sure, matching a hash isn't a crime and you will eventually be exonerated.

s/will/might/

scotty79 · on Aug 25, 2021

So there's a national database of all child pornography that Apple has access to? How is that access legal?

yeldarb · on Aug 25, 2021

That is a good point; has Apple stated how many bits two images’ NeuralHashes can differ by and still be considered a “match” by their system?

cannabis_sam · on Aug 25, 2021

> Why are exact collisions interesting?

1. What does “exact” mean to you in this context?

2. What else is more interesting about a hashing algorithm used to identify things, other than its collision rate?

scotty79 · on Aug 25, 2021

Ad.1 Same value of each of 96 bits of the hash.

Ad.2 If hashes are to be matched approximately, not exactly, for example will be considered a match if they differ in less than 3 bits out of 96 then the most interesting thing should be how many collisions you can find if you compare them like that.

umanwizard · on Aug 25, 2021

That explanation doesn’t make sense to me. Standard (IEEE) floating-point math is deterministic; it shouldn’t depend on the hardware.

user-the-name · on Aug 25, 2021

The processors that implement hardware-accelerated neural nets are not necessarily using standard IEEE floating-point math.

nullc · on Aug 25, 2021

> They are not intended to be compared exactly.

Apple's private set intersection which leaks the keys to decrypt the images coniditional a neuralhash match requires an exact match.

They probably didn't realize they got different results on different toolchains/devices, since they target a mono-culture and the whole subsystem shows fairly little careful thought went into it. They could easily make an exact integerized version which would be consistent.

It would still be broken. :)

chefandy · on Aug 25, 2021

... does it have to prove Apple wrong about something to be interesting?

ajb · on Aug 25, 2021

If you can get exact collisions, this can be gamed. For example, suppose there are two rival gangsters. One wants to set the police on his rival. He knows that a certain (innocuous) image is on his rival's phone. So he pays someone to generate a fake child-porn image with the same neuralhash, and ensure that it gets into the child porn DB. Then, apple reports the rival to the police, and they come and investigate him. OF course, they may notice that the image isn't the right one, but by that time they may have found other incriminating evidence.

Not sympathetic to a rival gangster? Ok lets find an innocent victim: not a rival criminal, but an innocent witness who our protag wants to intimidate. Gangster wants to intimidate the witness, but can't get at them, so cooks up a scheme to convince the witness that the police are in his pocket. Exactly as above, causing the police to investigate the witnesses phone.

Another one might be, a certain government wants to identify opposition groups using images associated with them . Apple is not keen to be associated with that, but the government can simply generate fake child-porn (remember, programmatically generated CP is just as illegal) for each image of interest.

monkeynotes · on Aug 25, 2021

I think before a criminal investigation, or any investigation at all is pursued, a human verifying the images would dismiss the false positive.

I would think surreptitiously placing actual child porn on a rival's phone/computer would be much, much more effective.

Cybercriminals could likely do all this remotely. Phish for apple account login, upload images. Done.

nullc · on Aug 25, 2021

> a human verifying the images would dismiss the false positive

How is a human supposed to distinguish that a visual derivative (a low res sobel filtered image, presumably) of ordinary, lawful, adult pornography isn't child porn when the system has already identified it as such?

I agree that using real child porn is an attack too, but at least in that case you could say the system was doing as designed (even though what its doing shouldn't be something that we want) ... but it's not even guaranteed to do as designed.

monkeynotes · on Aug 25, 2021

Are you saying humans cannot visually inspect the actual image that the flag is set against? If not, then how on earth do you prosecute someone if you can't demonstrate they had the actual image, and not just the derivation used to flag it?

The way I see it working is Apple scans a ton of shit, some of it shows up as possible child porn, human intervenes and looks at the source images, if there is indeed child porn they report to the police.

nullc · on Aug 25, 2021

Apple's review has no access to the matched images in the database, just a 'derivative' of your image. Hopefully the prosecutors will check against the real images. But by that point your privacy has been damaged and your reputation might be trashed.

KingOfCoders · on Aug 25, 2021

So you think an underpaid worker in India is interested in the upkeeping of innocence of a person they don't know in the US? They'll just flag the images as fast as possible, they are probably paid by pictures-reviewed.

"Amazon has people transcribing audio in Costa Rica, India and Romania." according to Engadget. So it would be safe to assume Apple does something along those lines, I can't remember where Siri transcripts have been sent when there was the Alexa controversy.

From there it's an entry with law enforcement and you need to find a way to convince them that the images are your children.

And if data isn't cleared correctly and you have another run in with law enforcement, there will always be this picture-stuff.

iamtheworstdev · on Aug 25, 2021

you think the US government would respect the law before initiation of actions to take down a criminal? they rarely do that when dealing with people that dont have criminal backgrounds

UncleMeat · on Aug 25, 2021

Okay. Then why are you worried about this new system, given that the US govt can just throw you in prison at a whim?

zepto · on Aug 25, 2021

If you really believe you live in a country with a completely corrupt legal system, then this stuff is the least of your worries.

Someone · on Aug 25, 2021

If your enemy can get an image on your phone and in the “child porn DB”, I think they can easily get you in trouble without Apple’s help.

they can either just send the police an anonymous message or set up a child porn web site and have it ‘accidentally’ leak its password database, and make sure your email address is in it.

ajb · on Aug 26, 2021

The point is that they don't need to get an image on your phone. They can just choose one they know is there.

AnonC · on Aug 25, 2021

A very relevant point on this entire discourse about Apple’s on-device CSAM scanning:

According to the U.S. law, key snippets of which are quoted on the Stratechery blog (by Ben Thompson), Apple isn’t obligated to scan for CSAM. It’s only obligated to act on CSAM if it finds them.

While it’s good for Apple to scan on its systems (iCloud) like Facebook, Google and other companies do on their servers, it’s inappropriate to do it on individual devices, which starts with the assumption that anyone who has iCloud photos enabled is a potential CSAM hoarder and needs to pay with their device’s battery life and time for the scanning to happen and report back. It’s a sort of micro-robbery that Apple is doing on the devices when there is no legal compulsion to do so.

Everything else on trusting Apple’s NeuralHash or the sanctity of the NCMEC hashes come later, IMO.

I sincerely hope Apple realizes that it’s got a dud solution on hand, eats humble pie (which it’s usually not capable of) and ditches this whole thing. I know a lot of egos at Apple are at stake here. But doing the right thing matters for a company that claims that “privacy is a fundamental human right” and has a CEO who’s a member of a marginalized/discriminated community and understands the risks of these efforts.

fleddr · on Aug 25, 2021

"While it’s good for Apple to scan on its systems (iCloud) like Facebook, Google and other companies do on their servers, it’s inappropriate to do it on individual devices"

I would even challenge the justification to do this on servers, unless the data is public. If it's behind a personal login, you might as well consider it personal property/data. I find the distinction of where data is stored not very meaningful.

Allowing things to be searched for criminal content just because it's not in your immediate physical sphere makes no sense. It doesn't work like that in the physical world either. When I send a letter, and it leaves my house, no authority has the right to check its contents without a legitimate reason. Likewise, if I put stuff in a storage box in some warehouse, no authority can search it without a warrant.

Note that I'm talking about personal storage (iCloud, Gmail), not public social networks like Facebook.