There is no way to get rid of a prompt injection attack. There are always ways t...

mentos · 2025-01-12T22:25:51 1736720751

The raw text of the persons message can/will be posted to the forum and be obvious to the community if it’s a prompt injection to be flagged for human review and their account banned.

satvikpendem · 2025-01-12T22:28:20 1736720900

Sure, that's if human moderators see it before the AI, in which case, why have an AI at all? I presume in this solution that the AI is running all the time and it will see messages the instant they're sent and thus will always be vulnerable to a prompt injection attack before any human even sees it in the first place.

mentos · 2025-01-12T23:59:03 1736726343

To moderate the majority of the community that will not be attempting prompt injections.

What meaningful vulnerabilities are there if the post can only be accepted/rejected/flaggedForHumanReview?

satvikpendem · 2025-01-13T01:14:49 1736730889

That's what you tell the AI to do, who knows what other systems it has access to? For example, where is it writing the flags for these posts? Can it access the file system and do something programmatically? Et cetera, et cetera.

mentos · 2025-01-13T04:11:19 1736741479

The same way OpenAI offers its service to hundreds of millions of users without compromising any other systems it’s running on.

satvikpendem · 2025-01-13T04:14:38 1736741678

OpenAI doesn't allow write access to any file system. If you are recording posts to be reviewed, then you must necessarily store that information somewhere, at which point you will be allowing the AI to access some sort of data storage system, whether it be a file system or a database.

dijit · 2025-01-13T09:17:44 1736759864

is that really an issue in practice?

I'm sure you can coax openai to send a http request, at which point you can just queue up automated reports.

cutemonster · 2025-01-13T14:47:19 1736779639

No it's not. Well, if designing the system in bad ways, it can be, but that can be said about anything.

There's no need to do this: (from GP)

> > at which point you will be allowing the AI to access

No need to allow the AI to access anything.

Send it the comment thread, what the forum is about, the users profile text, and then the AI outputs a number. Any security problem is then because of bugs the humans wrote in their code.

Prompt injection? Yes, so there still needs to be ways to report comments manually, and review.

mentos · 2025-01-13T15:31:52 1736782312

CustomGPTs have write access to change their name and icon. OpenAI has a memory feature which persists between chat sessions. What are you talking about?