"But this news is exactly that! A computer translation of the same semantic concept from one syntax to another without ever having been taught the rules connecting them."
By that standard, statistical translation approaches were "understanding" a long time ago. The new thing here isn't that systems aren't being taught "the rules" (that wasn't happening in statistical MT either), the new thing is that there's a different kind of classifier in the "middle" now, which is representing a hidden state. This classifier is more flexible in a lot of ways, but also more of a black box, and takes a lot more effort to train without overfitting. It's cool that you can translate between language pairs that have never been explicitly trained, but let's not overstate the meaning of it.
The blog post makes this rather breathless speculation:
"Within a single group, we see a sentence with the same meaning but from three different languages. This means the network must be encoding something about the semantics of the sentence rather than simply memorizing phrase-to-phrase translations. We interpret this as a sign of existence of an interlingua in the network."
This is...a fun story, but not much else. First off, you can make dimensionality reduction plots that "show" a lot of things. Even ignoring that issue, in translations of short sentences involving specific concepts (i.e. the example about the stratosphere), is it really surprising that you'd find clusters? The words in that sentence are probably unique enough that they'd form a distinct cluster in mappings from any translation system.
Folks get caught up in the "neural" part of neural networks, and assume that some magical quasi-human thought is happening. If the tech were called "highly parameterized reconfigurable weighted networks of logistic classifiers", there'd be less loopy speculation.
Don't worry, I'm not being bamboozled by the word "neural". My argument that there is a definition of understanding that you can derive from a well-known thought experiment that looks like it is met by this implementation of "highly parameterized reconfigurable weighted networks of logistic classifiers".
I don't see any particular difference between training a classifier and teaching rules; the rules are just encoded in the parameters of the classifier. If it helps, you can just replace "taught" with "trained on" and "rules" with "data", but there's no version of the Chinese Room Argument where you're sitting in a room with boxes full of unsupervised learning datasets and a book of sigmoid functions.
Perhaps this system works similarly to previous ones, but not having been taught (trained on) any rules (data) about the specific language pairs in question seems to be a strong argument for some kind of semantic representation of language. You might have seen that before, but I haven't and the article seems to imply that it's new. Again, I'm talking specifically about the similarity between this result and an example of something "machines can't do".
The point is that the non-magical argument goes both ways. If a brain is just a complicated and meaty computer, then we should expect sophisticated enough programs on powerful enough hardware to start displaying things we might recognise as intelligent. That's not going to look particularly impressive – our machine translator isn't going to develop a conscience or try to unionise – but it might do something that qualifies for some definition of understanding.
But you are getting into magical thinking, in that there is no reasonable definition of "understanding" that this system meets. It cannot reason or make deductions. It can't re-write sentences to use completely different words/structures but imply the same meaning. In fact, there is literally no "conceptual" representation here -- there is a vector of numbers that gets passed between encoder and decoder, but it is no more a form of intelligence than the "hidden" state that is maintained by an HMM.
"not having been taught (trained on) any rules (data) about the specific language pairs in question seems to be a strong argument for some kind of semantic representation of language."
Well, yeah, there's a representation of language. But it isn't "semantic" -- it's vector of language-independent parameters for a decoder, which can then output symbols in a second language. Could you theoretically imagine some huge magical network of logistic classifiers that uses this as the first of a (far larger) processing machine that enables something like human intelligence? Maybe. But this is not it. This is bigger, far more complicated/flexible version of a machine that is purpose-built to map between sequences of text.
(That said, I really don't want to go down the rabbit hole of "what is AGI, anyway?", which is about as productive/interesting as hitting the bong and wondering if maybe we all live in a computer simulation after all. I'm merely observing that this is not an intelligent machine.)
> It cannot reason or make deductions. It can't re-write sentences to use completely different words/structures but imply the same meaning.
I agree that it doesn't meet these definitions of understanding. I'm arguing that it meets the definition of "semantics from syntax".
> Well, yeah, there's a representation of language. But it isn't "semantic" -- it's vector of parameters for a decoder, which then output symbols in a second language.
What is it that makes a vector of parameters not semantic? Would it be semantic if they were stored in a different format? If I tell you I have a system in which the concept of being hungry is stored as the number 5, would you say "that's not a concept, that's a number"? If that vector of parameters represents being hungry in any language, what is it if not a semantic representation of hunger?
There's no need to imagine a huge magical network that implements a grandiose vision of intelligence. We're talking about a small, non-magical network that implements a very modest vision of intelligence. Bacteria are still alive even though they're a lot less complex than we are. What would you expect the single-celled equivalent of intelligence to look like? Something with a very minor capacity for inference? Something with rudimentary abstraction across different representations of the same underlying idea?
By that standard, statistical translation approaches were "understanding" a long time ago. The new thing here isn't that systems aren't being taught "the rules" (that wasn't happening in statistical MT either), the new thing is that there's a different kind of classifier in the "middle" now, which is representing a hidden state. This classifier is more flexible in a lot of ways, but also more of a black box, and takes a lot more effort to train without overfitting. It's cool that you can translate between language pairs that have never been explicitly trained, but let's not overstate the meaning of it.
The blog post makes this rather breathless speculation:
"Within a single group, we see a sentence with the same meaning but from three different languages. This means the network must be encoding something about the semantics of the sentence rather than simply memorizing phrase-to-phrase translations. We interpret this as a sign of existence of an interlingua in the network."
This is...a fun story, but not much else. First off, you can make dimensionality reduction plots that "show" a lot of things. Even ignoring that issue, in translations of short sentences involving specific concepts (i.e. the example about the stratosphere), is it really surprising that you'd find clusters? The words in that sentence are probably unique enough that they'd form a distinct cluster in mappings from any translation system.
Folks get caught up in the "neural" part of neural networks, and assume that some magical quasi-human thought is happening. If the tech were called "highly parameterized reconfigurable weighted networks of logistic classifiers", there'd be less loopy speculation.