ChatGPT has more "memory" to it. If you tell it something, it "remembers" it across the chat session.
GPT3 is stateless. You give it the prompt and it responds back. Context doesn't carry between requests unless you (the user) pass it along. That can make "chat" sessions with GPT (not ChatGPT) expensive as passing along the past context of the chat session consumes a lot of tokens as it gets longer and longer.
It had a context window of 8192 characters (2x as long as GPT3).
It is possible they are using GPT itself to summarize the past transcript and pass that in as context. So your “remember X” would probably get included in that summary.
That said, I have not tried to tell it to remember X and then exceeded the 8192 token contest window.
While this is true, I feel like it's worth pointing out that ChatGPT "just" feeds your previous conversation to GPT-3 every message. I don't really think there's any difference between the two under the hood.
Makes sense. Do you know if there exits a re-implementation of a ChatGPT-like stateful conversational interface on top of the GPT-3 API? A cursory search doesn't turn up anything.
The issue is that to maintain the state, you need to maintain the history of the conversation.
For example, the friend chat starts out at 28 request tokens. I submit it, get a response back, and it's at 45 tokens. I add in a bit of my own commentary... and it's 59 tokens. I hit submit and its 80 tokens.... and so on.
That session wasn't particularly insightful, but you get the idea.
The issue is that I did three requests with 28 + 59 + 88 tokens. As I keep going, this can add up. To maintain that context, my previous history (and the chat response) remains in the prompt for the next message growing arithmetically.
---
(edit)
Working from the chat instead of friend chat... a bit more interesting conversation.
Note however, that at this point the next request will be at least 267 tokens. We're still talking fractions of a penny ($0.002 / 1k tokens) - but we are talking money that I can look at rather than even smaller fractions. And as noted - it grows each time you continue with the conversation.
Likely yes. You could even switch to a less intensive model for doing that (e.g. Curie). The Curie model is 1/10th as costly as Davinci to run. Running 8k tokens through Davinci is $0.16 while Curie would only be $0.016 - and that's likely showing up in back end compute and should be considered if someone was building their own chat bot on top of gpt3.