> GPT-3 seems to have issues with large numbers. Moyix’s gist covers this in det...

YeGoblynQueenne · on March 30, 2022

>> There are two issues here. One is the lack of working memory, which means that there is very little scratch space for calculating things with a meaningful sequential depth.

It's a language model. It can generate text, not "calculate things".

If you give it the right prompt, it will generate the right text, but if there's any computation going on, that's you computing the right prompt.

See Clever Hans:

https://en.wikipedia.org/wiki/Clever_Hans

canjobear · on March 30, 2022

If this were true, then engineered prompts would fail for held-out problem instances. But they don’t.

YeGoblynQueenne · on March 30, 2022

I don't understand what you mean by that. Which held-out problem instances?

canjobear · on March 30, 2022

Suppose you engineer a prompt to make GPT3 do arithmetic. You design the prompt to work for a particular set of training examples like 1+1 and 2+3. If all the computation is in the prompt engineering, and GPT3 is just Clever Hans, then this engineered prompt should do no better than chance if you then hand it new instances like 4+5 with the same prompt.

YeGoblynQueenne · on March 30, 2022

>> Suppose you engineer a prompt to make GPT3 do arithmetic

Oh, I think I see what you mean. Thank you for clarifying. So, no, I didn't mean that the prompt is engineered to make it look like the model is performing a calculation. I meant that GPT-3 has memorised instances of arithmetic operations and in order to retrieve them from its memory the human user must figure out the right prompt. I wrote "that's you computing the prompt", not "that's you computing the result".

The prompt is like a SQL query, right? If you don't enter the right query, you don't get the right results. That's the point of all those people on the internets fiddling with their prompts- it's like they're trying to query a database, but they don't know what the right syntax is for their query, so they tweak it until it returns the results they want.

For example, the OP mentioned thousands separators being very helpful to the model. That's because it's memorised more arithmetic results with thousands separators, than without. So you're more likely to get the right results out of it if you use thousands separators.

Also because like the OP says GPT-3 has a separate concept for a digit and a string of digits and a separate one again for a string of digits and other symbols. "9999" is, in its model, a different thing than "9,999".

Which, btw, is why it can't calculate. Because to calculate, a system must have a representation of the concept of a number. Otherwise, calculate- with what?