Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> GPT-3 seems to have issues with large numbers. Moyix’s gist covers this in detail. GPT-3 tends to guesstimate an algebraic function instead of evaluating the numbers, so the answer is only correct to a certain approximation.

There are two issues here. One is the lack of working memory, which means that there is very little scratch space for calculating things with a meaningful sequential depth. GPT-3 is very unlike traditional evaluation methods in this regard, in that it is easier for it to interpret the meaning of a program you give it and then intuit the result given the context than it is to mechanically execute its steps.

The other issue is the text encoding, which makes it much harder for GPT-3 to do digit-by-digit operations. Many arbitrary numbers are just their own token. A fixed length number to us looks like a fixed number of characters, but for GPT-3 they can be and almost arbitrary number of tokens divided into almost arbitrary chunks. Using thousands separators is very helpful for it.

If you account for these and design a prompt that mitigates them you can get much stronger results. Here is an example: https://news.ycombinator.com/item?id=30299360#30309302. I managed an accuracy of 42% for 3-by-3 digit multiplication.



>> There are two issues here. One is the lack of working memory, which means that there is very little scratch space for calculating things with a meaningful sequential depth.

It's a language model. It can generate text, not "calculate things".

If you give it the right prompt, it will generate the right text, but if there's any computation going on, that's you computing the right prompt.

See Clever Hans:

https://en.wikipedia.org/wiki/Clever_Hans


If this were true, then engineered prompts would fail for held-out problem instances. But they don’t.


I don't understand what you mean by that. Which held-out problem instances?


Suppose you engineer a prompt to make GPT3 do arithmetic. You design the prompt to work for a particular set of training examples like 1+1 and 2+3. If all the computation is in the prompt engineering, and GPT3 is just Clever Hans, then this engineered prompt should do no better than chance if you then hand it new instances like 4+5 with the same prompt.


>> Suppose you engineer a prompt to make GPT3 do arithmetic

Oh, I think I see what you mean. Thank you for clarifying. So, no, I didn't mean that the prompt is engineered to make it look like the model is performing a calculation. I meant that GPT-3 has memorised instances of arithmetic operations and in order to retrieve them from its memory the human user must figure out the right prompt. I wrote "that's you computing the prompt", not "that's you computing the result".

The prompt is like a SQL query, right? If you don't enter the right query, you don't get the right results. That's the point of all those people on the internets fiddling with their prompts- it's like they're trying to query a database, but they don't know what the right syntax is for their query, so they tweak it until it returns the results they want.

For example, the OP mentioned thousands separators being very helpful to the model. That's because it's memorised more arithmetic results with thousands separators, than without. So you're more likely to get the right results out of it if you use thousands separators.

Also because like the OP says GPT-3 has a separate concept for a digit and a string of digits and a separate one again for a string of digits and other symbols. "9999" is, in its model, a different thing than "9,999".

Which, btw, is why it can't calculate. Because to calculate, a system must have a representation of the concept of a number. Otherwise, calculate- with what?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: