If you don’t consider the difference in kind between a human vulnerability and an automated vulnerability that derives from the essentially unlimited capacity of the latter to scale, your comment makes a lot of sense. If you do consider that, the argument becomes irrelevant and deeply misleading
This needs to be hammered into people's understanding of the danger of LLMs at every opportunity. Enough of the general population considers things like Twitter bots to have scaled to a dangerous point of polluting the information ecosystem. The scalability and flexibility of LLMs in germinating chaos is orders of magnitude beyond anything we've yet seen.
An example I use for people is the Bernstein Bears effect. Imagine you wake up tomorrow and all your digital devices have no reference to 9/11. You ask Bing and Google and they insist you must be wrong, nothing like that ever happened. You talk to other people who remember it clearly but it seems you've lost control of reality; now imagine that type of gaslighting about "nothing happening" while the lights go out all over the world and you have some sense of what scale the larger of these systems are operating at.
Twitter is just one example though, this problem is going to affect every single online community. If the LLM bull case is correct, the internet is going to be absolutely flooded with sophisticated misinformation.
Sophisticated being key. Quantity * quality almost indiscernible from mediocre human input.
Currently we tend to understand bad information on the stream as a function where quality is linear and quantity is exponential, and individuals or human filters can still identify reject the lower 99% as spam. Every point closer on the graph the quality comes to resemble human-made content represents an exponential degree of further confusion as to base facts. This isn't even considering whether AI develops its own will to conduct confusion ops; as a tool for bad actors it's already there, but that says nothing of the scale it could operate at eventually.
The sophistication of the misinformation is exactly the point: That's the mass multiplier, not the volume.
[edit] an interesting case could be made that the general demand for opinionated information and the individual capacity to imbibe and adjudicate the factuality of the input was overrun some years ago already... and that all endeavors at misinformation since then have been fighting for shares of an information space that was already essentially capped by the attention-demand. In that paradigm, all social networks have fought a zero-sum game, and LLMs are just a new weapon for market share in an inflationary environment where all information propagated is less valuable as the volume increases and consumption remains static. But I think this is the least worrisome of their abilities.
Would univeral adoption of digital signatures issued by trusted authorities alleviate this problem to any degree?
For example, my phone would automatically sign this post with my signature. If I programmed a bot, I could sign as myself or as a bot, but not as another registered human. So you'd know the post came from me or a bot I've authorized. Theft or fraud with digital signatures would be criminalized, it isn't already.
No, I think we should check for an actual pulse before people post.
Your comment is wild, by the way. You think people should be allowed to run a bot farm, as long as they can digitally sign for it... but people who don't pay for a signature should be arrested?
I'm just asking if some system of using digital signatures could help weed through the inevitable proliferation of bots and deepfakes and ai agents.
I'm pretty sure it's already illegal to steal someone else's signature in some jurisdictions.
There would be no legal requirement to use a signature. No change there. Just as you cam send postal mail today with a return address and no name, and you can buy items with paper cash, and so forth. The government would give out verified signature, or the phone providers, and it'd be free. I don't really have the answers.
The difference you're talking about is only in the fact that humans don't scale like computer code. If humans were to scale like computer code, you'd still find the "vulnerability" unfixable.
But that difference is a big part of why this matters. That this might be unfixable is not a strong argument for moving forward anyway, if anything it should prompt us to take a step backwards and consider if general intelligence systems are well suited for scalable tasks in the first place.
There are ways to build AIs that don't have these problems specifically because their intelligence is limited to a specific task and thus they don't have a bunch of additional attack vectors literally baked into them.
But the attitude from a lot of companies I'm seeing online is "this might be impossible to fix, so you can't expect us to hold off releasing just because it's vulnerable." I don't understand that. If this is genuinely impossible to fix, that has implications.
Because the whole point with AI is to make things that are scalable. It matters that the security be better than the non-scalable system. If it can't be better, then we need to take a step back and ask if LLMs are the right approach.
I guess we are talking past each other. I agree that there are many things we can and should do to improve the safety of integrating ML tools into our lives. I agree that there are unique challenges here, such as scaling, creating new dangers that will require new methods of mitigation. I disagree that "prompt injection" is a meaningful category of vulnerabilities to talk about, and that it is fixable in LLMs or other comparably general systems.
I've argued before that "prompt engineering" is a bad term, granting connotations to precision and care to a task that's anything but. "Prompt injection", however, is IMO a dangerous term, because it confuses people into thinking that it's something like SQL injection or XSS, and thus solvable by better input handling - where in fact, it is very different and fundamentally not solvable this way (or at all).
Yeah, I'll add a bit of an apology here: I interpreted your comments as being in the same spirit as other arguments I've gotten into on HN that were basically saying that because humans can be phished, we don't need to worry about the security of replacing human agents with LLMs -- we can just do it. But I know enough of your comment history on this site and I'm familiar enough with your general takes that I should have been more curious about whether that was actually what you in particular meant. So definitely, apologies for making that assumption.
----
My only objection to talking about whether "prompt injection" is solvable is that (and maybe you're right and this is a problem with the phrase itself) I've found it tends to provoke a lot of unproductive debates on HN, because immediately people start arguing about context separation, or escaping input, or piping results into another LLM, and I got kind of tired of debating why that stuff could or couldn't work.
And I found out that I can kind of sidestep that entire debate by just saying, "okay, if it's easy to solve, let me know when it's solved, but the companies launching products today don't have mitigations in place so let's talk about that."
If I'm wrong and it does get solved, great. But it says something about the companies building products that they're not waiting until it gets solved, even if they believe that it can be solved. In some ways, it's even worse because if they really believe this is easy to solve and they're not putting in these "easy" mitigations or waiting for the "fix" to drop, then... I mean, that's not a flattering position for them to be in.
I agree with what you're saying, but I really want to get across to people that there are practical failings today that need to be taken seriously regardless of whether or not they think that "prompt injection" is just SQL-injection #2.
I owe you an apology too: I took your comment and, instead of focusing 100% on the thing you were trying to argue and discovering the nuance, I pattern-matched a more surface-level read to the flawed reasoning about LLMs I see a lot, including on HN, but one that I know you do not share.
Thank you for elaborating here and in other branches of this discussion. I now see that you were reading my take as encouraging a view that "humans can be prompt-injected too, therefore LLMs are not that different from humans, and we already allow humans to do X", which indeed is very worrying.
The view I have, but failed to communicate, is more like "humans can be prompt-injected too, but we have thousands of years worth of experience in mitigating this, in form of laws, habits, customs and stories - and that's built on top of hundreds of thousands of years of honing an intuition - so stop thinking prompt injection can be just solved (it can't), and better get started on figuring out LLM theory of mind fast".
> I really want to get across to people that there are practical failings today that need to be taken seriously regardless of whether or not they think that "prompt injection" is just SQL-injection #2.
I agree with that 100%, and from now on, I'll make sure to make this point clear too when I'm writing rants against misconceptions on "prompt engineering" and "prompt injection". On the latter, I want to say that it's a fundamentally unsolvable problem and, categorically, the same thing as manipulating people - but I do not want to imply this means it isn't a problem. It is a very serious problem - you just can't hope someone will solve "prompt injection" in general, but rather you need to figure out how to live and work with this new class of powerful, manipulable systems. That includes deciding to not employ them in certain capabilities, because the risk is too high.
It's the blockchain and NFT hype train all over again. Shoehorning it into places it doesn't belong, bad implementations to boot, and actually making things less performant, less secure, and more expensive in the process.
Right, but humans don’t scale that way, so the threat is completely different.
This is like saying a nuclear weapon accident is not that scary because you can also have a microwave malfunction and catch on fire. Sure you can —- but the fact it’s not a nuke is highly relevant.
No, I'm saying that securing against "prompt injection" is like saying you want to eliminate fission from physics, because you're worried about nukes. That's not how this reality works. Nuclear fission is what happens when certain conditions are met. You're worried about nukes? Stop playing with nukes. I'm not saying they aren't dangerous - I'm saying that you can't make them safer by "eliminating fission", as it makes no physical sense whatsoever. Much like "securing against prompt injections" in language models, or a GAI, or in humans.
> Sure, current bleed of LLMs is badly vulnerable to some trivial prompt injections - but I think a good analogy would be a 4 year old kid.
This reads like you’re trying to say “don’t worry about it, humans are vulnerable too and it’s threatening the way a 4 year old child is” not “correct, we cannot prevent nuclear explosions given that we have fission and yes we’re on track to putting fission devices into every single internet-connected household on the planet.”
There is a reason humans with security clearances can’t just have an arbitrary large number of interactions with foreign nationals, or that good interrogators say they can always get info from people if they talk enough m
I'm saying "stop trying to solve the problem of consumer market IoT fission bombs by trying to remove fission from physics - this just can't possibly work, and it takes special confusion to even think it might; instead, focus on the 'consumer-market', 'IoT' and 'bomb' parts".
"Prompt injection" is a vulnerability of generic minds in the same sense "fission" is a vulnerability of atoms.
I think what GP (and I) are talking about is that social engineering is limited in scope because humans don't scale like computer code. A theoretical AGI (and LLMs) do scale like computer code.
To use an admittedly extreme example: The difference between drawing some fake lines on the road and crashing 1 or 2 cars and having all self-driving cars on the road swerve simultaneously is not just a quantitative difference.
What we have, right now, is enough to turn many things in the world upside-down, and we only owe it to human and legal inertia that this is still not absolutely apparent yet, unless you're following what's happening.
Empiricism by definition is concerned only with the past. There is no way to extrapolate from where we were to where we will be with such extreme technological shifts.
Everything that matters, including how much money you have, but more importantly when you and the people you love become sick or die accidentally is governed by chance.
If you think money gets you a pass, maybe recheck your understanding of the issues and the stoic solution. The shallow packaged version produced by Ryan Holiday isn’t a good source
I expected to read that and find Holiday to be some kind of terrible human but I came out of it with an appreciation of what he was doing, despite the author’s clear intent to paint him in a bad light.
Given that the mean salary at google is probably driven by people making 7 or 8 figures, I’m going to suggest that laying off 10,000 people will not change that one bit