It’s clear they do seem to construct models from which to derive responses. The problem is once you stray away from purely textual content, those models often get completely batshit. For example if you ask it what latitude and longitude are, and what makes a town further north than another, it will tell you. But if you ask it if this town is further north than this other town, it will give you latitudes that are sometimes correct, sometimes made up, and will randomly get which one is further north wrong, even based on the latitudes it gave.
That’s because it doesn’t have an actual understanding of the geography of the globe, because the training texts werent sufficient to give it that. It can explain latitude, but doesn’t actually know how to reason about it, even though it can explain how to reason about it. That’s because explaining something and doing it are completely different kinds of tasks.
If it does this with the globe and simple stuff like latitudes, what are the chances it will mess up basic relationships between organs, symptoms, treatments, etc for the human body? Im not going to trust medical advice from these things without an awful lot of very strong evidence.
You can probably fix this insufficient training by going for multimodal training. Just like it would take excessively long to teach a person the concept of a color that they can't see, an AI would need infeasible amount of text data to learn about, say music. But give it direct training with music data and I think the model will quickly grasp a context of it.
> It’s clear they do seem to construct models from which to derive responses. The problem is once you stray away from purely textual content, those models often get completely batshit
I think you mean that it can only intelligently converse in domains for which it's seen training data. Obviously the corpus of natural language it was trained on does not give it enough information to infer the spatial relationships of latitude and longitude.
I think this is important to clarify, because people might confuse your statement to mean that LLMs cannot process non-textual content, which is incorrect. In fact, adding multimodal training improves LLMs by orders of magnitude because the richer structure enables them to infer better relationships even in textual data:
I don't think this is a particular interesting criticism. The fact of the matter is that this just solved by chain-of-though reasoning. If you need the model to be "correct", you can make it get there by first writing out the two different latitudes, and then it will get it right. This is basically the same way that people can/will guesstimate at something vs doing the actual math. For a medical AI, you'll definitely need it to chain-of-thought every inference and step/conclusion on the path but...
>you can make it get there by first writing out the two different latitudes, and then it will get it right
As I said in my comment, even if the model 'knows' and tells you that town A is at 64' North latitude and town B is at 53', it will sometimes tell you town B is the furthest north.
That's because it's training set includes texts where people talk about one town being further north that the other, and their latitudes, but the neural net wasn't able to infer the significance of the numbers in the latitude values. There wasn't enough correlation in the text for it to infer their significance, or generate a model for accurately doing calculations on them.
Meanwhile the training text must have contained many explanations of what latitude and longitude are and how to do calculations on them. As a result the model can splurge out texts explaining latitude and longitude. That only helps it splurge out that kind of text though. It doesn't do anything towards actually teaching it what these concepts are, how they relate to a spherical geographic model, or to actually do the calculations.
It's the same way GPT-3 could reliably generate texts explaining mathematics and how to do arithmetic in lots of very accurate detail, because it was trained on many texts that gave such explanations, but couldn't actually do maths.
It is possible to overcome these issues with a huge amount of domain relevant training text to help the LLM build a model of the specific problem domain. So these problems can be overcome. But the point stands that just because a model can explain in detail how to do something, that doesn't mean it can actually do it itself at all. They're completely different things that require radically different training approaches.
Can you give an example that ChatGPT 4 doesn't get right? ChatGPT4 is much much better at logic than 3.5, it's almost laughable. It's really really impressive.
Here is ChatGPT 4s output btw:
> What is the longitude an latitude of Brussels?
Brussels, the capital city of Belgium, is located at approximately 50.8503° North latitude and 4.3517° East longitude.
> What about New York?
New York City, located in the United States, has approximate coordinates of 40.7128° North latitude and 74.0060° West longitude.
> Which one is more north?
Brussels is more north than New York City. Brussels is located at approximately 50.8503° North latitude, while New York City is at approximately 40.7128° North latitude.
That’s because it doesn’t have an actual understanding of the geography of the globe, because the training texts werent sufficient to give it that. It can explain latitude, but doesn’t actually know how to reason about it, even though it can explain how to reason about it. That’s because explaining something and doing it are completely different kinds of tasks.
If it does this with the globe and simple stuff like latitudes, what are the chances it will mess up basic relationships between organs, symptoms, treatments, etc for the human body? Im not going to trust medical advice from these things without an awful lot of very strong evidence.