The fact that the bot will answer in a kink-friendly way and then censor itself ...

dado3212 · on Oct 6, 2023

Bing does the same thing, I think just optimizing for latency. Admittedly it probably shaves off 10-15s in the usual response, I’d probably make the same decision.

baobabKoodaa · on Oct 6, 2023

When Bing AI first launched there were some really shenanigans where the AI would threaten to blackmail or murder the user and half a second later delete the message and replace it with a censored one.

willcipriano · on Oct 6, 2023

I spent several nights laughing uncontrollably getting ChatGPT to generate things it doesn't want to and as the text would get spicy it would suddenly get cut off, and that would make it much funnier to me. I assumed it worked in the way you described.

O_nlogn · on Oct 6, 2023

chat GPT has the same behaviour, no? I've had it send most or all of a response before the censor system triggers it to be redacted.

dontupvoteme · on Oct 6, 2023

ChatGPT's web interface has two, one is triggered by a moderation endpoint API call which scolds you and another one is hardcoded as a regex type filter for copyright which forcibly closes the pipe from the LLM instantly and doesn't acknowledge that something happened. It's hardcoded because a translation to another language or a typo inserted into the output avoids it.

You can get this (or at least could) by asking for the opening of tale of two cities (a public domain work!)

The API (at least via playground) now also has scolding built in, which triggers sometimes when you're just playing around with settings like high temp, because the model can devolve into a mess of all sorts of nonsense text, as is teh nature of transformers, but it doesn't censor it.

itishappy · on Oct 6, 2023

Anyone know how the API deals with this?

Does it send a response, then a follow-up payload with an "ohshit plz delete that" message?

xg15 · on Oct 6, 2023

The funny thing is that the "plz delete" messages have to be executed by the browser javascript. So in theory, you should be able to capture the "deleted" messages by keeping the network tab open or recording the traffic, right?

Edit: Last time I checked, ChatGPTs web interface was using server-sent events to stream the response words. The events were clearly visible in the network tab if you opened it early enough. So if it sends "delete" messages, they should show up in there.

dontupvoteme · on Oct 6, 2023

This is seemingly not at all uncommon. At least in the past when I asked bing for code it would start writing it and then go back and delete what it had written and say that it couldn't help with that.

I guess they don't want to cannibalize Copilot