I'm very confused by the use case here, and this doesn't make sense to me:
> A good example of this would be applying a machine learning model on some sensitive data where a user would be able to know the result of model inference on their data without revealing their input to any third party (e.g., in the medical industry).
I don't get why I would care that the answer was generated specifically by GPT4. It sounds like they're billing this as some sort of "run a model on input with homomorphic encryption" but that doesn't really sound possible, and to the extent that it is I don't think you could ever convince me that the people managing the model on the GPU couldn't get access to both the plaintext input and plaintext output.
The way to get this kind of security is both simple and hard: make models that can run on consumer hardware.
An important use-case is federated learning, which Google, and many healthcare / pharmaceutical companies are very interested in.
In federated learning, multiple companies or groups with their own private data come together to train a model jointly on all the private data, while keeping the data private.
You need more than zero-knowledge proofs to actually do federated learning securely, but to my limited knowledge they are one tool in the toolbox that can be useful.
It sounds almost too good to be true, but snarks enable a prover to convince a verifier in O(log n) time that a statement of size n is true. In fact many constructs enable this in O(1) verifier time (but the prover is quite slow). With zk-snarks, part of the statement can even be private: the proof reveals nothing about the input, yet it can convince a verifier.
All of this is probabilistic and making some assumptions about the complexity of an adversary, but that is very normal in cryptography. We consider eddsa signatures secure, even though one could in theory find the private key by brute force. Snarks “convince” a verifier in the same manner: generating a proof of a false statement is computationally infeasible, but in principle not impossible.
All that can really be conveyed is the "truth" that the model produced the output in response to the input. Given the other vulnerabilities of neural networks (biases, opaqueness, etc.) this is a bit like worrying about a MITM attack when communicating with a sock puppet.
I think you're misunderstanding the example use-case (it's understandable, it took me a decent amount of completely focused time to really understand zero knowledge cryptography).
The use-case with medical data described isn't like homomorphic encryption (where computation is done on an untrusted device, but with encrypted inputs/outputs). It would be more like: being able to prove you have some medical condition to an insurer, without your doctor having to provide them the lab results, but instead providing a ZK proof of an ML model that detects that condition, for example.
Using these kinds of proofs, you can start to silo your information off to just the parties that absolutely must have access to it, while also allowing parties that need to be able to verify certain narrow attributes (like an insurer needing proof you needed some procedure) to verify it without needing to invade your privacy.
The neural engine in the A16 bionic on the latest iPhones can perform 17 TOPS. The A100 is about 1250 TOPS. Both these performance metrics are very subject to how you measure them, and I'm absolutely not sure I'm comparing apples to bananas properly.
However, we'd expect the iPhone has reached its maximum thermal load. So without increasing power use, it should match the A100 in about 6 to 7 doublings, which would be about 11 years. In 20 years the iPhone would be expected to reach the performance of approximately 1000 A100's.
At which point anyone will be able to train a GPT-4 in their pocket in a matter of days.
You're assuming no algorithmic enhancements and missing the currently happening shift from 16bit to 4bit operations which will soon give ML hardware a 4x improvement on top of everything else.
We could be training GPT-4s in our pockets by the end of this decade.
To be fair, they’re also being extremely generous about HW scaling. There’s no way we’re going to see doublings every 18 months for the next 6+ years when we’ve already stopped doing that for the past 5-10.
Have you read the Wikipedia page? Moore’s law started ending ~23 years ago followed by Denmark Scaling ~18 years ago. It’s not necessarily fully stopped because there are other architectural improvements that have been delivered along the way, but we simply have reached nearly the end of the road for scaling this due to a combination of heat dissipation challenges and inability to shrink transistors further. 3D packaging might increase things further but it’s difficult and an area of active research (+ once you do that afaik you’ve unlocked the “last” major architectural improvement). I think the current estimates put the complete end to further HW improvements at ~2050 or so. You can still improve software or build dedicated ASICS/accelerators for expensive software algorithms, but that’s the world pre-Moore which saw most accelerators die off because the exponential growth of CPU compute obviated the need for most of them (except for GPUs). We’re coming back to it with things like Tensor cores. Reversible computing is the way forward after we hit the wall but no one knows how to do this yet.
> But in 2011, Koomey re-examined this data[2] and found that after 2000, the doubling slowed to about once every 2.6 years. This is related to the slowing[3] of Moore's law, the ability to build smaller transistors; and the end around 2005 of Dennard scaling, the ability to build smaller transistors with constant power density.
Wikipedia mis-cited it in the text and should have said "But in 2016". However, the 2016 analysis misses the A11 Bionic through A16 Bionic and M1 and M2 processors -- which instantly blew way past their competitors, breaking the temporary slump around 2016 and reverting us back to the mean slope.
Mainly because now they're analyzing only "supercomputers" and honestly that arena has changed, where quite a bit of the HPC work has moved to the cloud [e.g. Graviton] (not all of it, but a lot), and I don't think they're analyzing TPU pods, which also probably have far better TOPS/watt than traditional supercomputers like the ones on top500.org.
One large use case of ML and ZKML is Verifiable Computing. You can have an IoT device be able to enforce an untrusted super computer to process it's data on a certain program in a correct manner.
A 17 million parameter model (~Resnet50) takes more than 50s proof time. Is this on top of the inference time?
I can see some niche applications for this system, but I am very skeptical it's ability to handle larger models (100M+) and the ability to and it's scalability when there are increased demand.
If Facebook releases Llama, and updated models thereafter, for purchase or as freeware, there will not really be as much need for this since everything will happen safely, locally, no?
It would be cool to see Meta release a 7B parameter as shareware, and subsequent larger models for a fee.
> A good example of this would be applying a machine learning model on some sensitive data where a user would be able to know the result of model inference on their data without revealing their input to any third party (e.g., in the medical industry).
I don't get why I would care that the answer was generated specifically by GPT4. It sounds like they're billing this as some sort of "run a model on input with homomorphic encryption" but that doesn't really sound possible, and to the extent that it is I don't think you could ever convince me that the people managing the model on the GPU couldn't get access to both the plaintext input and plaintext output.
The way to get this kind of security is both simple and hard: make models that can run on consumer hardware.