I guess we are talking past each other. I agree that there are many things we can and should do to improve the safety of integrating ML tools into our lives. I agree that there are unique challenges here, such as scaling, creating new dangers that will require new methods of mitigation. I disagree that "prompt injection" is a meaningful category of vulnerabilities to talk about, and that it is fixable in LLMs or other comparably general systems.
I've argued before that "prompt engineering" is a bad term, granting connotations to precision and care to a task that's anything but. "Prompt injection", however, is IMO a dangerous term, because it confuses people into thinking that it's something like SQL injection or XSS, and thus solvable by better input handling - where in fact, it is very different and fundamentally not solvable this way (or at all).
Yeah, I'll add a bit of an apology here: I interpreted your comments as being in the same spirit as other arguments I've gotten into on HN that were basically saying that because humans can be phished, we don't need to worry about the security of replacing human agents with LLMs -- we can just do it. But I know enough of your comment history on this site and I'm familiar enough with your general takes that I should have been more curious about whether that was actually what you in particular meant. So definitely, apologies for making that assumption.
----
My only objection to talking about whether "prompt injection" is solvable is that (and maybe you're right and this is a problem with the phrase itself) I've found it tends to provoke a lot of unproductive debates on HN, because immediately people start arguing about context separation, or escaping input, or piping results into another LLM, and I got kind of tired of debating why that stuff could or couldn't work.
And I found out that I can kind of sidestep that entire debate by just saying, "okay, if it's easy to solve, let me know when it's solved, but the companies launching products today don't have mitigations in place so let's talk about that."
If I'm wrong and it does get solved, great. But it says something about the companies building products that they're not waiting until it gets solved, even if they believe that it can be solved. In some ways, it's even worse because if they really believe this is easy to solve and they're not putting in these "easy" mitigations or waiting for the "fix" to drop, then... I mean, that's not a flattering position for them to be in.
I agree with what you're saying, but I really want to get across to people that there are practical failings today that need to be taken seriously regardless of whether or not they think that "prompt injection" is just SQL-injection #2.
I owe you an apology too: I took your comment and, instead of focusing 100% on the thing you were trying to argue and discovering the nuance, I pattern-matched a more surface-level read to the flawed reasoning about LLMs I see a lot, including on HN, but one that I know you do not share.
Thank you for elaborating here and in other branches of this discussion. I now see that you were reading my take as encouraging a view that "humans can be prompt-injected too, therefore LLMs are not that different from humans, and we already allow humans to do X", which indeed is very worrying.
The view I have, but failed to communicate, is more like "humans can be prompt-injected too, but we have thousands of years worth of experience in mitigating this, in form of laws, habits, customs and stories - and that's built on top of hundreds of thousands of years of honing an intuition - so stop thinking prompt injection can be just solved (it can't), and better get started on figuring out LLM theory of mind fast".
> I really want to get across to people that there are practical failings today that need to be taken seriously regardless of whether or not they think that "prompt injection" is just SQL-injection #2.
I agree with that 100%, and from now on, I'll make sure to make this point clear too when I'm writing rants against misconceptions on "prompt engineering" and "prompt injection". On the latter, I want to say that it's a fundamentally unsolvable problem and, categorically, the same thing as manipulating people - but I do not want to imply this means it isn't a problem. It is a very serious problem - you just can't hope someone will solve "prompt injection" in general, but rather you need to figure out how to live and work with this new class of powerful, manipulable systems. That includes deciding to not employ them in certain capabilities, because the risk is too high.
I've argued before that "prompt engineering" is a bad term, granting connotations to precision and care to a task that's anything but. "Prompt injection", however, is IMO a dangerous term, because it confuses people into thinking that it's something like SQL injection or XSS, and thus solvable by better input handling - where in fact, it is very different and fundamentally not solvable this way (or at all).