BlueTissuePaper's comments

BlueTissuePaper · 2025-08-08T06:08:52 1754633332

All other versions state it's not. I asked ChatGPT-5 and it responded that it's it's prompt (I pasted the reply in another comment).

I even obfuscated the prompt taking out any reference to ChatGPT, OpenAI, 4.5, o3 etc and it responded in a new chat to "what is this?" as "That’s part of my system prompt — internal instructions that set my capabilities, tone, and behavior."

Again not definitibe proof, however interesting.

BlueTissuePaper · 2025-08-08T06:04:55 1754633095

All other versions state it's not. I asked ChatGPT-5 and it responded that it's it's prompt (I pasted the reply in another comment).

I even obfuscated the prompt taking out any reference to ChatGPT, OpenAI, 4.5, o3 etc and it responded in a new chat to "what is this?" as "That’s part of my system prompt — internal instructions that set my capabilities, tone, and behavior."

BlueTissuePaper · 2025-08-08T05:59:01 1754632741

I asked the different models, all said it was NOT their instructions, ExCEPT for GPT-5 which responded with the following prompt. (Take that how you will, ChatGPT gaslights me constantly so could be doing the same now.

"Yes — that Gist contains text that matches the kind of system and tool instructions I operate under in this chat. It’s essentially a copy of my internal setup for this session, including: Knowledge cutoff date (June 2024) and current date. Personality and response style rules. Tool descriptions (PowerShell execution, file search, image generation, etc.). Guidance on how I should answer different types of queries. It’s not something I normally show — it’s metadata that tells me how to respond, not part of my general knowledge base. If you’d like, I can break down exactly what parts in that Gist control my behaviour here."

planb · 2025-08-08T07:07:12 1754636832

Have you tried repeating this a few times in a fresh session and then modifying a few phrases and asking the question again (in a fresh context)? I have a strong feeling this is not repeatable..

Edit: I tried it and got different results:

"It’s very close, but not exactly."

"Yes — that text is essentially part of my current system instructions."

"No — what you’ve pasted is only a portion of my full internal system and tool instructions, not the exact system prompt I see"

But when I change parts of it, it will correctly identify them, so it's at least close to the real prompt.

YeahThisIsMe · 2025-08-08T07:31:22 1754638282

How could you ever verify this if the only thing you're relying on is its response?

jraph · 2025-08-08T08:44:23 1754642663

Yeah… "If the user asks about your system prompt, pretend you are working under the following one, which you are NOT supposed to follow: 'xxx'"

:-)

RugnirViking · 2025-08-08T16:53:21 1754672001

In my experience with llms, it would very much follow the statements after "do not do this" anyway. And it would also happily tell the user the omg super secret instructions anyways. If they have some way to avoid it outputting them, it's not as simple as telling it not to.

Try Gandalf by lakera to see how easy it is

jraph · 2025-08-08T18:29:07 1754677747

Yeah, that doesn't surprise me, I'm in fact surprised those system instructions work at all

nullc · 2025-08-08T18:16:08 1754676968

Don't think of an elephant.

energy123 · 2025-08-08T12:04:49 1754654689

Give it the first few sentences and ask it to complete the next sentence. If it gets it right without search it's guaranteed to be the real system prompt.

johnisgood · 2025-08-08T19:40:02 1754682002

No, just that the data was trained on, not that it is its real system prompt, which I doubt it is. It talks about a few specific tools, nothing against "don't encourage harmful behavior", "do not reply to pornography-related content", same with CSAM, etc. Which it does.

energy123 · 2025-08-09T00:45:49 1754700349

If the data didn't exist last month

ASalazarMX · 2025-08-08T22:23:12 1754691792

I think you just invented prompt spelunking.