I'm gonna put forward the very view that gwern repeatedly argues against: "but.....

YeGoblynQueenne · on July 3, 2020

Well, there is no formal definition of "understanding" in the context of CS, AI, or machine learning so anyone can claim anything they like, with respect to the term.

For example, I have a thermos that keeps my coffee cold in the summer and hot in the winter. It u n d e r s t a n d s.

qayxc · on July 3, 2020

There are a number of NLP-tasks that aim to quantify understanding, e.g. textual entailment. No currently published model is even remotely close to human-level performance on all of these tasks.

As long as there are no ways to properly query models, it's hard to qualify their level of understanding. It would help immensely if we could ask models for rules as in "why was the object labelled 'a car'" (in case of image recognition) or directly query any grammatical rules discovered during the processing of language.

Especially in classification tasks, knowledge extraction (e.g. by outputting rules) would be so much more helpful than simply having an AI looking at a CT image and spit out "yep - that's a tumour, alright", while having radiologists scratch their heads as to why...

YeGoblynQueenne · on July 3, 2020

I had to look up textual entailment (on wikipedia) because I wasn't sure of its formal definition. It turns out, it doesn't have one:

>> "t entails h" (t ⇒ h) if, typically, a human reading t would infer that h is most likely true"

So in other words it's down to good old eyballing. I'm not impressed, but not surprised either, it's just one of the many poorly defined tasks in machine learning, particularly NLP which has turned into a quagmire of shoddy work ever since people started firing linguists to improve their systems' performance.

Anyway, since logical entailment is central to my field of study I can tell that if textual entailment is less strictly defined than logical entailment (as per the wikipedia article), then it doesn't require anything that we could recognise as "understanding". Because logical entailment certainly doesn't require understanding and its definition is as strict, as a very strict thing [1]. I mean, I can see how loosening a requirement for precision of any justification of a decision that "A means B" can improve performance, but I can't see how it can improve understanding.

Edit: I'm not sure we disagree, btw, sorry for the grumpy tone. I fully agree with your gist about explainability etc.

______________

[1] Roughly, "A |= B iff for each model M, of A, M is a model of B", where A and B are sets of first order logic formulae and a "model" in this context is a logical interpretation under which a set of formulae is true. A "logical interpretation" is a partition of a predicate's atoms to true and false.

the8472 · on July 3, 2020

https://arxiv.org/pdf/1811.03970.pdf

https://openaccess.thecvf.com/content_cvpr_2018/papers/Baumg...

qayxc · on July 3, 2020

Both papers provide promising first steps in the right direction but are by no means solutions to the problem at hand. I mean, the second paper is even based on the premise that classification has already been done by human experts as a preparation step...

sturza · on July 17, 2020

How could we accurately measure understanding? Honest question, because i am curious.

sutterbomb · on July 3, 2020

What makes you confident that you aren't overestimating the importance that we "experience anything"

api · on July 3, 2020

When I say "I have a laptop in front of me," I am describing an understanding of something that is being experienced (sensed). If a Markov text generator outputs this text, it's just rearranging bits. I don't see any evidence that GPT-3 is doing anything more than rearranging bits in a much more elaborate way than a Markov text generator. The results kind of dazzle us, but being dazzled doesn't indicate anything in particular. I see something akin to a textual kaleidoscope toy, a generator of novel text that is syntactically valid and that produces odd cognitive sensations when read.

I maybe should have said sensed, not experienced, since experience also leads into much deeper philosophical discussions around the nature of mind and consciousness. I wasn't really going there, since I don't see anything in GPT-3 or any similar system that merits going there.

I also don't see any evidence that it is drawing any new conclusions or constructing any novel thoughts about anything. It's regurgitating similar results to pre-existing textual examples, re-arranging new ideas in new ways. If you don't think actual new ideas exist then this may be compelling, but if that's the case I have to ask: where did all the existing ideas come from then? Some creative mechanism must exist or nothing would exist, including this text.

The fact that the output often resembles pop Internet discourse says more about the mindlessness of "meme-think" than the GPT-3 model.

As for real world uses, social media spam and mass propaganda seems like the most obvious one. This thing seems like it would be a fantastic automated "meme warrior." Train it on a corpus of Qanon and set it to work "pilling" people.

the8472 · on July 3, 2020

> When I say "I have a laptop in front of me," I am describing an understanding of something that is being experienced (sensed).

I would ascribe that to two factors a) you have a more immediate, interactive interface to the physical world than GPT does, which is limited to a textual proxy and b) GPT naturally is not a human-level intelligence, it is still of very limited complexity so its understanding more akin to that of a parrot trying to understand its owner's speech patterns. It can infer a tiny bit of semantics and mimic the rest. The ratio is a continuum.

> As for real world uses, social media spam and mass propaganda seems like the most obvious one.

fragments full sentence completion useful maybe.

ianhorn · on July 3, 2020

Take active learning versus usual learning. Often with active learning you can learn much faster. That's a kind of "experience." Out of distribution problems where it fails to generalize could be dealt with much more efficiently when a model can ask "hey what's f(x=something really weird and specific that would never come up in an entire internet's worth of training data)?" Experience isn't passive, and that makes a whole world of difference. And that's not even touching on the difficulty of "tell me all about elephants" versus "let me interact with an elephant and see it and touch it and physically study it."

tripzilch · on July 4, 2020

yesterday I watched a youtube video about GPT3 (https://www.youtube.com/watch?v=_8yVOC4ciXc), and it showed two poems. One was human made, the other was from AI trained on that human's poems.

Both poems were pretty good. But one of them had a metaphor about the moon reflecting in ocean waves, being distorted and taking on monstrous forms.

I figured this had to be the human one, it was a novel description (because metaphor) of a very real experience (how the moon appears in reflection on the ocean).