Explained from scratch: private information retrieval and homomorphic encryption

blintz · on Sept 26, 2022

Author here, happy to answer any questions! I know it’s a complicated topic, and I’m still trying to figure out how best to explain it.

Also, we’ve applied to YC for W23 to commercialize and expand this tech; wish us luck!

thethirdone · on Sept 26, 2022

Not really related to the article, but ever since I learned that homomorphic encryption exists I wondered if it is possible to make a trustless internet poker game.

That would entail some type of homomorphic encryption where each party would get to know which cards they drew, but not anyone elses. The type of system I am imagining has each player generate a permutation of 52 cards, encrypt it with their key and then take everyone's encrypted permutations together and decrypt a result where they only get to see the 5 cards they got dealt.

Is there any research into multi-key systems or ways to apply a single key system to get the kind of result I am looking for?

mhluongo · on Sept 26, 2022

Look into secure multi-party computation and "mental poker". I believe a version of this is solved today, but don't know the trust assumptions off the top of my head.

mikewarot · on Sept 29, 2022

If the server has all the data, and does the homomorphic math, couldn't it selectively change some of the data inbound to a second job with the same user inputs, (in the background) and compare outputs, eventually using a a binary search to figure out what data was selected?

MattPalmer1086 · on Sept 26, 2022

Very nice technical explanation, thanks.

The technology is cool, but I wonder what kind of market you think might exist for it?

Some years back I worked for a privacy startup, but we never really figured that out...

blintz · on Sept 26, 2022

Yeah, the challenge always seems to building privacy technologies that people are actually willing to pay for.

The one trove of public data that folks seem really eager to access in a metadata-private way seems to be blockchain data. As someone who was not really a blockchain person, I've been really pleasantly surprised by how much interest people in that field take in our tech. In the short run, we are definitely focused on this area.

I do think there are significant applications beyond that though: I am really personally excited by metadata-private messaging/email and sensitive datasets (financial data, medicine/health data, cybersecurity data).

teleforce · on Sept 26, 2022

Thanks for the article, really interesting read, most likely your startup will be successful and all the best!

I think the killer application for homomorphic encryption (HE) will be on the regulated and sensitive data as you have mentioned. In healthcare, high accuracy machine learning requires massive datasets and it is difficult to get your hands on the data mainly due to privacy concerns. If HE can make data available to be analyzed without compromising the patients' anonymity it will be a game changer and perhaps can relax on the accessibility of the sensitive clinical data in the near future.

Just wondering, can we not achieve similar privacy using secret sharing (SS) cryptography for anonymity? With (SS) unlike HE, you can perform arbitrary calculations in the encrypted domain [2].

[1]How to Share a Secret:

https://news.ycombinator.com/item?id=31817716

[2]Secure multi-party computation:

https://en.m.wikipedia.org/wiki/Secure_multi-party_computati...

blintz · on Sept 26, 2022

Yeah, a lot of cool folks are working on doing ML on encrypted data (Zama, Enveil, Duality). It's a really challenging problem, and I think it will take a big, well-funded team to get working well. We're partly interested in PIR because it's much more tractable, and we are able to make really significant progress on it as a scrappy two-man team.

Multi-party computation is also a cool piece of technology. I think that I was always drawn to HE because it involves a completely trustless relationship with the server; in MPC, there's more a of a "m/n parties don't collude" assumption, that can be tricky to implement in practice. (Also side note: you can in theory perform arbitrary computation on encrypted data using FHE; the performance impact is really significant, like 100-10000x, but you can perform unlimited computation).

Surprisingly, these two concepts (MPC and HE) work quite well when combined! There are some computations each is suited to computing efficiently (linear for HE, non-linear for MPC), so when combined for ML, you get really good results. See this landmark paper and the follow on work: https://eprint.iacr.org/2018/073.pdf.

Grimburger · on Sept 27, 2022

> high accuracy machine learning requires massive datasets and it is difficult to get your hands on the data mainly due to privacy concerns. If HE can make data available to be analyzed without compromising the patients' anonymity

FHE is painfully slow. The only real domains it can work realistically are for very simple computations. You are putting the cart miles ahead of the donkey here.

fragmede · on Sept 27, 2022

Privacy minded IoT device backends and womens fertility tracking apps could both use HE backends, and aren’t huge amounts of data either

onetimeusename · on Sept 27, 2022

How does verification of data encrypted this way work say by the party that made the query? Like what is to verify that the cipher text was not tampered with? Can this be done?

blintz · on Sept 27, 2022

There's no single way to do this kind of verification. Different applications will have different 'roots of trust' for the authenticity of database data.

For example, for Wikipedia, we could imagine the Wikipedia site serving the PIR themselves (they are already the entity you trust to decide what is on Wikipedia). A third party wishing to mirror Wikipedia over PIR might try to get some kind of digital signature over the contents of each article from Wikipedia or another trusted source.

For blockchain queries, there are more technical solutions. For Bitcoin, for example, we could include a Merkle tree proof-of-inclusion for the transaction (a la BIP 37).

fjfbsufhdvfy · on Sept 27, 2022

I'm confused, it seems like this will require processing the entire database for every request, which would be unusable in practice. Did I miss the trick?

blintz · on Sept 28, 2022

It does - for databases of a reasonable size, this is okay. Roughly 1-50 GB databases are quite practical.

timobile · on Sept 26, 2022

Just want to thank you for doing this!

ElCheapo · on Sept 26, 2022

Do you think homomorphic encryption will be the next hot thing in security-related sectors of tech?

blintz · on Sept 26, 2022

I think it will be if we're successful!

I do think the cool thing about what we're building is you don't really have to understand exactly what homomorphic encryption is to use / benefit from it.

We're really just building a database that can't learn what rows you read, which is useful (and practical!) across a bunch of domains. The fact that it uses homomorphic encryption under the hood is cool, but hopefully you don't need to get too bogged down in the details to use it.

m1ghtym0 · on Sept 27, 2022

AFAIK the problem for FHE today is the massive performance impact that makes it impractical for many applications (1000x slow-down at best). From my experience with PETs I can tell that performance is a quite sensitive topic. Especially since the shift toward cloud computing, performance bottlenecks can lead to painful costs penalties. Are you optimistic that FHE performance will improve significantly with the ongoing research?

blintz · on Sept 27, 2022

Yeah, I‘d agree that performance is the crux of the issue for many PETs.

I am not terribly confident that over the next 5 years we will see anything like a 10x or 100x improvement in general fully homomorphic encryption.

Instead, I think we already have some domains for which FHE is really practical today (like private information retrieval), and we should focus on those. In these domains, the cost overhead of PIR is less than ~10-100x, and negligible for many real-world scenarios.

m1ghtym0 · on Sept 27, 2022

Makes sense, thanks for the clarification!

1vuio0pswjnm7 · on Sept 27, 2022

"A naive solution is to download and store the entirety of Wikipedia; then, all your queries can be local and the server never learns anything. Unfortunately, this takes a lot of bandwidth and storage (~10 GB), so it isn't very practical."

We can pretend this is a large amount of data but that notion becomes less believable each year, as bandwidth and storage capacity keep increasing.

Many of us now stream (=download and throw away) much more than 10GB on a regular basis.

Local databases, where queries never leave the computer or the local network, exclude much of the "tech" industry that survives on monitoring and manipulating computer users through the internet. One would expect strong opposition from "tech" enthusiasts to any suggestion that a computer user can perform work or recreation offline.

"Tech" companies may not be the only ones. There is also that pathetic term "go dark" which hints at more surveillance being done through the internet.

If the computer user disconnects, it leaves many parties high and dry. Spending more time offline might not be a bad thing.

faizshah · on Sept 27, 2022

Is there a library for homomorphic encryption that application devs can use? Is it practical to use this technology yet?

Edit: I found this library from Microsoft but it only supports addition and multiplication, not comparisons and sorting so it seems it’s not practical yet: https://github.com/microsoft/SEAL#microsoft-seal-1