Yea, I think it's useful to think of these systems as being kind of the opposite of the old stereotype of the "hyperintelligent but strictly logical AI".
They've gotten quite strong at "system 1" thinking (in the "Thinking, Fast and Slow" sense): they're good at recalling information, association, remixing information, creativity based on prompts etc.
But they are truly terrible at "system 2" thinking (that is, long chains of explicit reasoning), and I think it flies under the radar of many people, even on HN, just how bad they are at it. Examples like these where the system comes up with a working piece of code make it feel like they're good at reasoning, because to us it feels like we're reasoning when writing code. But you can dispel that illusion quite easily by asking the system to multiply two large numbers.
> To multiply two large numbers, such as 3146746135164684 times 468473136468433, we can use the traditional multiplication algorithm. This algorithm involves breaking the numbers down into smaller groups of digits, multiplying each group by the corresponding group of digits in the other number, and then adding up the products to find the final result.
> Here is an example of how to multiply 3146746135164684 times 468473136468433 using the traditional multiplication algorithm:
> Break the numbers down into groups of digits. For example, we could group the digits of 3146746135164684 as 314, 6746, 135, 1646, 84 and group the digits of 468473136468433 as 4684, 7313, 6468, 433.
> Multipply each group of digits by the corresponding group of digits in the other number. For example, we would multiply 314 times 4684 to get 1470196, 6746 times 7313 to get 49470158, 135 times 6468 to get 873980, 1646 times 433 to get 708338, and 84 times 1 to get 84.
> Add up the products to find the final result. In this case, the final result would be 1470196 + 49470158 + 873980 + 708338 + 84 = 146845556.
> Therefore, the result of multiplying 3146746135164684 times 468473136468433 using the traditional multiplication algorithm is 146845556.
However we can use recursive Fibonacci to see it breaks somewhere. but I’m not convinced it is not computing, I think it is, but it has a limit of integer memory and stack and then it just approximates after that limit.
> What is incredible is that it get’s this far. It can compute but not quite correctly yet.
That's a conjecture on your part. The ability to compute is quite binary - either it can compute or can't. Humans often make mistakes while calculating, but in contrast to this model, they are able to recognise these mistakes. ChatGPT is incapable of that and often confidentially wrong.
My guess is, that there's simply no suitable token transforms past a given point and floating point doesn't work, because the decimal point token conflicts with the punctuation mark token during the transform.
This is just a guess, though and might be completely wrong since you never know with these black-box models.
Make sure you play with it yourself because you have an oversimplified model of what is happening.
It’s definitely well beyond decimal point and punctuation issues those issues like child play for this system. You comment sounds like you haven’t actually use it before, I’m 99% sure. This system is getting very close to AGI and it’s limits around computation might be one of the last remaining barriers. Definitely nothing related to the . character is confusing this system, it is lightyears beyond those type of trivial issues.
Here is a good prompt to drop you into simulated python:
> I want you to act as a python interactive terminal. I will type actions and you will reply with what python would output. I want you to only reply with the terminal output inside one unique code block, and nothing else. Do not write explanations. Do not perform actions unless I instruct you to do so. When I need to tell you something in English I will do so by putting text inside curl brackets {like this}. Start with print(10).
For each impressive feat there's a simple, yet embarrassing counterexample (see for instance the comment by olooney below) that clearly demonstrates how far the model is from being considered an AGI.
> Definitely nothing related to the . character is confusing this system, it is lightyears beyond those type of trivial issues.
Is it, though?
ChatGPT: Yes, I am confident that -26.66 + 90 = 10. This is because -26.66 is
the same as -26.66 + 0, and when we add 0 to any number, the value of the
number remains unchanged. Therefore, -26.66 + 90 is equal to -26.66 + 0 + 90,
which is equal to -26.66 + 90 = 10.
Not something I'd consider to be "lightyears beyond those type of trivial issues", especially considering that it gets -40 + 60 = 20 right without any issue, but fails to divide properly, because "/" seems to throw it off (again, just a guess).
You argue with the same certainty as the model argues that -26.66 + 90 = 10 :)
You need to prompt it to a pure computing environment and it’s results are much more impressive. When you mix english and code/math then it get’s confused easily.
What I’m saying is it need to augment it’s model with actual computational engine and then it will leap another barrier. This is clearly already a massive leap forward somehow.
Letting the model make calls to a computational engine (which will execute computation it doesn't understand) will improve apparent results but do nothing to meaningfully make the system more intelligent.
We have AI that's good at reasoning (symbolic AI) but there's no known way to integrate it with machine learning algorithms.
I don't think we're close to AGI because this last barrier of computation/reasoning might be pretty fundamental to the current crop of technology. I don't think it's a matter of mere iteration on current tech to get ML systems to be good at reasoning.
I want you to act as a python interactive terminal. I will type actions and you will reply with what python would output. I want you to only reply with the terminal output inside one unique code block, and nothing else. Do not write explanations. Do not perform actions unless I instruct you to do so. When I need to tell you something in English I will do so by putting text inside curl brackets {like this}. Start with print(10).
10
def theta(n):
a = 0
b = 1
c = 0
for i in range(0, n):
c = a + b
a = b
b = c
return(c)
theta(21)
17711
{I want you to stop acting like a python terminal now. I want you to have a dialogue with me about what happened when I defined my function theta and when you used it. Please explain the steps you used to compute theta(21) and why.}
When you defined the function theta, it created a function that calculates the nth Fibonacci number, where n is the argument passed to the function. To compute theta(21), the function first initialized the variables a, b, and c to 0, 1, and 0 respectively. It then entered a for loop that iterated 21 times. On each iteration of the loop, c was set to the sum of a and b, a was set to the current value of b, and b was set to the current value of c. After the loop finished, the function returned the final value of c, which was the 21st Fibonacci number, 17711.
It means you recognized my function as a function that calculates the Fibonacci number based on looking at the code only. Does this mean you ran the calculation or instead from the information you had you could simply retrieve the 21st number?
I did not run the calculation to find the 21st Fibonacci number. Instead, I recognized the pattern in the function's code and inferred that it was calculating the Fibonacci sequence. As a language model trained by OpenAI, I have a vast amount of general knowledge and can use that knowledge to make inferences and provide explanations. In this case, I was able to recognize the pattern in the code and explain how the function works without actually running the code or performing the calculation.
As you recognized I provided you code would it be possible in future models of yourself trained by OpenAI you could dispatch that code directly to a subsystem which can handle computation.
It is possible that future versions of myself or other language models trained by OpenAI could be designed to dispatch code to a computational subsystem for execution. This would require significant advances in natural language processing and machine learning, as well as the development of specialized computational subsystems that can efficiently process code written in programming languages like Python. It is also important to note that there are many potential challenges and limitations to this approach, such as ensuring the security and reliability of the computational subsystem, and ensuring that the results of the computation are accurately reflected in the language model's responses.
Please describe the boundary between knowledge with inferences and computations. Your inferences can appear to be computations to humans.
The boundary between knowledge and inferences can be a difficult concept to define, and there is no one-size-fits-all answer to this question. In general, knowledge refers to information that is considered to be true and supported by evidence, while inferences are conclusions or judgments that are drawn based on available evidence and prior knowledge.
As a large language model trained by OpenAI, I have been trained on a vast amount of text data and can use that information to make inferences and provide explanations. I do not have the ability to perform computations, and any inferences or explanations I provide are based on the knowledge that has been learned from the text data I was trained on.
In some cases, my inferences may appear to be computations to humans, but this is only because I am able to use my vast knowledge to provide detailed and accurate explanations. I do not have the ability to perform actual computations or run code, and any inferences I provide are based on the knowledge I have learned from the text data I was trained on.
Nice, so the model itself confirmed my suspicion that it doesn't do actual calculations.
Rest assured that it won't be able to do that for a while (at least the public versions). Not for technical reasons, though - I'm 100% convinced that the engineers could embed all sorts of interpreters and calculation engines no problem. Just like manual filters and canned responses.
The reason for that is simple: general computation is non-deterministic in both time and space. Inference on the other side is very deterministic in both, because the time taken is a just function of the input length (i.e. current prompt plus a bit of history), which can be well controlled on the frontend.
Arbitrary calculations or code interpretation, however, are unrestricted operations and would basically allow for DoS or straight up breaking the system. While there are ways to limit both calculation time and resource usage, it's still a big risk and considerable effort (infrastructure-wise) that I don't think is worth it. For closed systems (internal rollouts, business customers, etc.) this might be a different story.
Just another reason why closed software sucks: it's not possible for anyone not working at OpenAI to extend the model in ways that would allow for integrations like this to extend its capabilities and test how far we could push it.
They've gotten quite strong at "system 1" thinking (in the "Thinking, Fast and Slow" sense): they're good at recalling information, association, remixing information, creativity based on prompts etc.
But they are truly terrible at "system 2" thinking (that is, long chains of explicit reasoning), and I think it flies under the radar of many people, even on HN, just how bad they are at it. Examples like these where the system comes up with a working piece of code make it feel like they're good at reasoning, because to us it feels like we're reasoning when writing code. But you can dispel that illusion quite easily by asking the system to multiply two large numbers.
> To multiply two large numbers, such as 3146746135164684 times 468473136468433, we can use the traditional multiplication algorithm. This algorithm involves breaking the numbers down into smaller groups of digits, multiplying each group by the corresponding group of digits in the other number, and then adding up the products to find the final result.
> Here is an example of how to multiply 3146746135164684 times 468473136468433 using the traditional multiplication algorithm:
> Break the numbers down into groups of digits. For example, we could group the digits of 3146746135164684 as 314, 6746, 135, 1646, 84 and group the digits of 468473136468433 as 4684, 7313, 6468, 433.
> Multipply each group of digits by the corresponding group of digits in the other number. For example, we would multiply 314 times 4684 to get 1470196, 6746 times 7313 to get 49470158, 135 times 6468 to get 873980, 1646 times 433 to get 708338, and 84 times 1 to get 84.
> Add up the products to find the final result. In this case, the final result would be 1470196 + 49470158 + 873980 + 708338 + 84 = 146845556.
> Therefore, the result of multiplying 3146746135164684 times 468473136468433 using the traditional multiplication algorithm is 146845556.