Hacker News new | past | comments | ask | show | jobs | submit login
AI’s Language Problem (technologyreview.com)
309 points by punnerud on Aug 9, 2016 | hide | past | favorite | 235 comments



No one would ever imagine that locking a baby in a featureless room with a giant stack of books would give them general intelligence. I don't understand why AI researchers think it will work for AIs. They need bodies that are biologically connected with the rest of the biosphere, with an intrinsic biological imperative, if they are ever to understand the world. I'm not saying they have to be exactly like us, but they will only be able to understand us to the extent that they have body parts and social experiences that are analogous to ours.

This isn't an engineering problem, it's a philosophical problem: We are blind to most things. We can only see what we can relate to personally. We can use language and other symbol systems to expand our understanding by permuting and recombining our personal experiences, but everything is grounded in our interactive developmental trajectory.

The kitten-in-a-cart experiment demonstrates this clearly: http://io9.gizmodo.com/the-seriously-creepy-two-kitten-exper... Interaction is crucial for perception. Sensation is not experience.

And here's the rub: Once you give an AI an animal-like personal developmental trajectory to use for grounding the semantics of their symbol systems, you end up with something which is not particularly different or better than a human cyborg.


I believe we can get AI from just text. Obviously that won't work for babies, because babies get bored quickly looking at text. AIs can be forced to read billions of words in mere hours!

Look at word2vec. By using simple dimensionality reduction on the frequency that words that occur near to each other in news articles, it can learn really interesting things about the meaning of words. The famous example is the vector "king" minus the vector for "man" plus "woman" equals "queen". It's learning the meaning of words, and the relationships between them.

Recurrent NNs, using similar techniques, but with much more complexity, can learn to predict the next word in a sentence very well. They can learn to write responses that are almost indistinguishable from humans. And it's incredible this works at all, given RNNs have only a few thousand neurons at most, and a few days of training, compared to humans' billions of neurons trained over a lifetime.

All of the information of our world is contained in text. Humans have produced billions of books, papers, articles, and internet comments. Billions of times more information than any human could read in their entire lifetime. Any information you can imagine is contained in text somewhere. I don't think it's necessary for AIs to be able to see, or interact with the world in any way.

If you can predict the word a human would say next, with enough accuracy, then you could also produce answers indistinguishable from theirs. Meaning you could pass the Turing test, and perform any language task they could do just as well. So language prediction alone may be sufficient for AGI.

This is the theory behind the Hutter Prize, which proposes that predicting (compressing) wikipedia's text is a measure of AI progress. The Hutter Prize isn't perfect (it's only a sample of wikipedia, which is very small compared to all the text humans have produced), but the idea is solid.


> All of the information of our world is contained in text

Communicate the concept of "green" to me in text.

The sound of a dog barking, a motor turning over, a sonic boom, or the experience of a Doppler shift. Beethoven's symphony.

Sour. Sweet. What does "mint" taste like? Shame. Merit. Learn facial and object recognition via text.

Vertigo.

Tell a boxer how to box by reading?

Hand eye coordination, bodies in 3 dimensional space.

Look, I love text, maybe even more than yourself. But all these things imbibe, structure and influence or text, but are not contained in them.

To make substantial inroads to something that looks like human esque AI, text is not enough. The division of these fields are artificial and based on our current limited tech and the specialisation of our researchers, faculties and limitations.

When we read, we play back memories, visions, sounds, feelings, etc, and inherent ideas gained through experience of ourselves as physical bodies in space.

Strong AI, at least to be vaguely recognised as such, must work with algorithms and machinery that understand these things, but which then works at that next level of abstraction to combine them into proper human type concepts.

Of course, there is the question about why we would want to create a human like AI, it's my contention that human like AI isn't actually what many of us would want, but that's another topic...


I think you're over-romanticizing the problem.

I won't touch the qualia aspect, but everything necessary to flawlessly pretend to understand the color green, the sound of a dog's bark, or the experience of hearing a sonic boom can be represented with text. As an existence proof, you could encode an adult human as a serial stream of characters.


Are blind or deaf people not intelligent?

But if you must pretend to be sighted and hearing, there are many descriptions of green, of dogs barking, of motors, etc, scattered through the many books written in English (and other languages.)

Are these descriptions perfect? Maybe not. But they are sufficient to mimic or communicate with humans through text. It's sufficient to beat a Turing test, to answer questions intelligently, to write books and novels, and political arguments, etc. If that's not AGI, I don't know what is.


Yes they are. However, is a blind, deaf, person with absolutely no motor control, no sense of touch, and no proprioception intelligent? Unclear. They certainly have no language faculties.


But a blind person can't describe green. A deaf person can't describe the sound of a motorboat. A person without taste can't describe mint flavor. That is the point I was making.

I don't propose that a human could lose all of their senses and still be able to communicate. But I do believe computers could do so, if they are designed to do that. Humans are not designed to work lacking those senses.


So a blind person would never be able to understand the different categories of color (other than that they are placeholders for distinct categories of something).

Now we are just speculating. We believe a computer might be able to understand things for which it doesn't have the sense - but that is speculation and totally untested, and certainly can no longer be justified by using human minds as an example.


A blind person could pretend to be sighted though. There have been blind authors who wrote about sighted characters, for instance. They need not experience the thing themselves. Just learn from experience how sighted people behave and describe things, and mimic that.


Can you provide any examples of blind (from birth) authors giving convincing visual descriptions from the points of view of sighted characters?

That seems hard to believe.


You can explain red by saying it's a "warm" color for example. Metaphors work, analogies, senstion from one sense can be explained using sensations from another. Now you need to have at least one sense, which machines clearly don't.


I don't think raw feels, qualia type stuff really counts as information in the information theoretic sense. Nor is understanding its nature necessary for artificial general intelligence (though perhaps it is (or perhaps not) for artificial consciousness, which is not the same thing.)


One way of looking at it is that some robots know how to walk, and this knowledge is encoded as a string of ones and zeros on a storage medium.


> All of the information of our world is contained in text

Even if this were a true statement, it's still the case that it might not be enough. There is a class of functions that are simply not learnable without some prerequisite knowledge. This is directly analogous to a one-time pad in crypto. It is entirely possible that the function 'language' is in this class of unlearnable functions. While it may be the case that certain varieties of intelligence are learnable tabula rasa from a powerful neural net, the surface form of human natural language (the part your recommending measuring) may simply not have enough information in it to decode the whole picture. It is entirely possible that you need to supply some of your own information as well to the picture, in a specific manner so as to act as a kind of decryption key. A record needs a record player, even if you can make similar sounds with cassettes and CDs.

And so, I'm willing to bet that you simply cannot, using raw, uninformed statistical techniques, predict what word a human would say next. You need to understand more of the underlying structure of humans first.

I will agree, however, that the success towards the Hutter Prize is a valuable demonstration of AI progress. Simply because I believe that maximal compression and the kind of intelligence I'm talking about are one and the same thing. You need to offload as much of the semantic weight of the corpus into the encryption algorithm as you can. That means building a very complex model of natural language. And if you accept the premise that this model is not simply learnable by observing the surface form, then that means building Strong AI


>There is a class of functions that are simply not learnable without some prerequisite knowledge. This is directly analogous to a one-time pad in crypto. It is entirely possible that the function 'language' is in this class of unlearnable functions.

I don't understand how this could possibly be the case. We can already make great progress on language understanding with simple methods like word2vec, or perhaps even markov chains. There are tons of statistical patterns in text that can be learned by computers.


It can be the case if Chomsky was right, and Universal Grammar and other similar structures are a thing. That would mean that part of our ability to understand language comes from the particular structure of our brain (which everyone seems to by and large share). That would mean that some of our ability to understand language is genetic in nature, by whatever means genes direct the structure of brain development.


So if language comes from the structure of the brain, what would stop us from simulating that structure to give a machine mastery of language? And specifically what would imply that a machine which had some of that structure would need to learn by interaction as the top level comment suggests?


Nothing would stop us from simulating human brain-like (or analogously powerful) structures to build a machine that genuinely understands natural language. I'm arguing that we can't just learn those structures by statistical optimization techniques though.

If it turns out that the easiest, or even only means of doing this is by emulating the human brain, then it is entirely possible that we inherit a whole new set of constraints and dependencies such that world-simulation and an emobdied mind are required to make such a system learn. If this turns out not to be the case, that there's some underlying principle of language we can emulate (the classic "airplanes don't fly like birds" argument) then it may be the case that text is enough. But that's in the presence of a new assumption, that our system came pre-equipped to learn language, and didn't manufacture an understanding from whole cloth. That the model weights were pre-initialized to specific values.


If there is an innate language structure in the brain then we know that it's possible to develop such a structure by statistical optimization, since this is exactly what evolution did, no?


But I don't see any reason a "universal grammar" couldn't be learned. It may take something more complicated than ANNs, of course. But it would be really weird if there was a pattern in language that was so obfuscated it couldn't be detected at all.


it comes down to the limits of available Information with a capital 'I'. If you're working within the encoding system (as you're recommending here with the "all the text in the world" approach), then in order to learn the function that's generating this information, the messages that you're examining have a minimum amount of information they can convey. There needs to be enough visible structure purely within the context of the messages themselves to make the underlying signal clear.

I don't think it's so weird to imagine that natural language really doesn't convey a ton of explicit information on its own. Sure, there's some there, enough that our current AI attempts can solve little corners of the bigger problem. But is it so strange to imagine that the machinery of the human brain takes lossy, low-information language and expands, extrapolates, and interprets it so heavily so as to make it orders of magnitude more complex than the lossy, narrow channel through which it was conveyed? That the only reason we're capable of learning language and understanding eachother (the times we _do_ understand eachother) is because we all come pre-equipped with the same decryption hardware?


Nah, it's mostly just hierarchical probability models.

http://www.ncbi.nlm.nih.gov/pubmed/24977647


This is a very neat paper, but:

1) They appear to have crafted the skeleton of a grammar as it is with their nodes, super nodes, and slot collocations. This is directly analogous to something like an Xbar grammar, and is not learned by the system; therefore, if anything, it's strengthening a Chomskian position; the system is learning how a certain set of signals satisfy its extant constraints.

2) The don't appear to go beyond generative grammar, which already seems largely solvable by other ML methods, and is a subset of the problem "language". Correct me if I'm wrong here, it's a very long paper and I may have missed something.



Connotation. Connotation is a huge part of human language, and is completely orthogonal to the denotation, which is what a vector is going to find. For instance, an AI should accurately be able to distinguish the fact that calling someone "cheap" is different from calling them "frugal", even though both objectively mean that the person doesn't spend much money.

There's also the related phenomenon of "subtext" -- the idea that some language has a different meaning than what's said. For instance, when I ask about whether a signature line on a form is required, and the other person says, "Yes, it's required. However you think best to get the signature." There's a subtext there of, "This signature won't actually be checked, so don't worry about it."


Wouldn't you still need to attach meanings to the words though? How could an AI system ever understand, for example, the Voynich Manuscript? There's plenty of text in it, and encryption methods when it was written weren't particularly strong. Or how would a person do if they were locked in a room with lots of books written in a language unknown to them?


Infact someone has run word2vec on the Voynich manuscript: http://blog.christianperone.com/2016/01/voynich-manuscript-w... (web archive while it's down: http://web.archive.org/web/20160205003817/http://blog.christ...) Such methods could someday completely decode the thing, but for now they just show the relationships between words and different clusters of words, not their meaning.

Of course we have no idea how the Voynich manuscript is encrypted (which would make the assumptions of word2vec wrong), or if it even has any meaning at all. And it's an incredibly small dataset compared to modern text corpuses, so there is probably significant uncertainty and overfitting. And other problems like inconsistent spellings, many errors in transcriptions, etc. But in principle this is a good strategy.

>how would a person do if they were locked in a room with lots of books written in a language unknown to them?

If you spent all day reading them, for years, and you somehow didn't get bored and kept at it, eventually you would start to see the patterns. You would learn how "slithy toves" are related to "brillig", even if you have no idea how that would translate to English. Study it long enough, and you may even be able to produce text in that language, indistinguishable from the real text. You may be able to predict the next word in a sentence, and identify mistakes, etc. Perhaps carry out a conversation in that language.

And I think eventually you would understand what the words mean, by comparing the patterns to those found in English. Once you have guesses for translations of just a few words, you can translate the rest. Because you know the relationships between words, and so knowing one word constrains the possibilities of what the other words can be.

If the translation it produces is nonsense, the words you guessed must have been wrong, and you can try again with other words. Eventually you will find a translation that isn't nonsense, and there you go. This would be very difficult for humans, because the number of hypotheses to test is so large, and analyzing text takes forever. Computers can do it at lightspeed though.


I'm familiar with this particular attack, as it was discussed here previously. It's a worthwhile attempt but the identification as star names, if real, hasn't been confirmed. But your reservations are justified.

More generally, has any attempt been made to identify the meanings of words in any sufficiently large corpus of text in a known foreign language (for example, Finnish), without being provided with a translation into English, and then compare the identified meanings to the actual meanings, as a first step towards translation?


There was a paper where they trained word vectors for English and Chinese at the same time. But they forced a few Chinese words to have the same vectors as their translated English words. This gave accurate translations for many Chinese words that didn't have translations.

Doing this without any translated words at all, would be more difficult. But I believe possible. It's actually a project I want to try in the near future.


> All of the information of our world is contained in text.

This statement is false. There is a well known thought experiment called Mary’s Room the gist of which is that knowing all conceivable scientific knowledge about how humans perceive color is still not a substitute for being a human and perceiving the color red: https://philosophynow.org/issues/99/What_Did_Mary_Know

The experience of seeing red is an example of what is called “qualia”.

In Google AI systems that identify cats, birds, etc it is reasonable to imagine AI technology evolving towards systems that can discuss those objects at the level of a typical person. However with an AI based on text only there is no possibility of that. It would be like discussing color with a blind person or sound with a deaf person.


Mary's Room and qualia is totally irrelevant. I'm not asking if the computer will "feel" "redness", simply if it can pretend to do so through text. If it can talk about the color red, in a way indistinguishable from any other human talking about red.

In any case, at some level everything is symbols. A video is just a bunch of 1's and 0's, as is text, and everything else. A being raised on only text input would have qualia just like a being raised on video input. It would just be different qualia.


Like explained elsewhere here, it may be that our brains share a genetically coded "decryption key", and that many of the things we talk about are too poorly and noisily expressed for a purely text based computer AI to ever truly replicate the processes going on inside our brains. Insufficient data, simply put, and no way to get it.

It may sure look indistinguishable, but on the inside it just wouldn't be the same.


The Mary's Room thought experiment is garbage, if you ask me. You can't just assume your hypothesis and then call the result truth.

If you assert that a person can understand everything there is to know about the color red and then still not understand what it is like to see red, you have either contradicted yourself or assumed dualism.


They assert that Mary understands the physical phenomenon of red. That is, she understands photons and eye structure, and therefore knows that light of a particular wavelength will trigger these sensors in the eye and thereafter be interpreted as "red" by a brain. All the physical components necessary to produce and sense the color "red". But when Mary sees the apple for the first time, did she learn something more about "red"?

Also, it's a thought experiment. Some people will claim the answer to that question is no, she learned nothing. Others will claim that she did. It's that thing she learned beyond the physical that theoretically cannot be conveyed by science, or even possibly by language.


To further this comment, also see Tacit Knowledge, of which the very definition is essentially knowledge that cannot or is extremely difficult to transfer through words alone.

https://en.wikipedia.org/wiki/Tacit_knowledge


It's learning the meaning of words, and the relationships between them.

Word2vec is definitely an impressive algorithm. But at the end of the day, it's just a tool that cranks out a fine-grained clustering of words based on (a proxy measure for) contextual similarity (or rather: an embedding in a high-dimensional space, which implicitly allows the words to be more easily clustered). And yes, some additive relations between words, when the signal is strong enough.

But to say that it's "learning" the "meaning" of these words is really quite a stretch.


I wonder if anyone has ever run this system on Lewis Carrol's Jabberwocky[1] or even something like Anthony Burgess's A Clockwork Orange, both of which contain a large number of made up words/slang/re-use.

I remember that when I first read A Clockwork Orange, it took me a while but I finally started to understand the meanings of those words/phrases (though I may not have every encountered them before.) It did feel like my brain was re-wiring itself to a new language. It'd be interesting to see how some type of language AI would treat these works.

https://en.wikipedia.org/wiki/Jabberwocky

edited to add there's a wiki article on the language of A Clockwork Orange, Nadsat: https://en.wikipedia.org/wiki/Nadsat


Word2vec may be crude, but it demonstrates that you can learn non-trivial relationships between words with even such a simple algorithm. What is the meaning of a word, if not the relationship it has to other words?

Gender was just an example. There are lots of semantic information learned by word2vec, and the vectors have shown to be useful in text classification and other uses. It can learn subtle stuff, like the relationship between countries, celebrities, etc. All that information is contained in a few hundred dimensions, which is tiny compared to the neurons in the brain.


I use word2vec a lot, and things like it, and I've always found it overstated to say that it "learns the relationships between things".

You say, as many people do, that the operation "king - man + woman = queen" indicates that it understands that the relation between "king" and "queen" is the relation between "man" and "woman". But there is something much simpler going on.

What you're asking it for, in particular, is to find the vector represented by "king - man + woman", and find the vector Q with the highest dot product with to this synthetic vector, out of a restricted vocabulary.

The dot product is distributive, so distribute it: you want to find the maximum value of (king * Q) - (man * Q) + (woman * Q).

So you want to find a vector that is like "king" and "woman", and not like "man", and is part of the extremely limited vocabulary that you use for the traditional word2vec analogy evaluation, but not one of the three words you used in the question. (All of these constraints are relevant.) Big surprise, the word that fits the bill is "queen".

(I am not the first to do this analysis, but I've heard it from enough people that I don't know who to credit.)

It's cool that you can use sums of similarities between words to get the right answer to some selected analogy problems. Really, it is. It's a great thing to show to people who wouldn't otherwise understand why we care so much about similarities between words.

But this is not the same thing as "solving analogies" or "understanding relationships". It's a trick where you make a system so good at recognizing similarities that it doesn't have to solve analogies or understand relationships to solve the very easy analogy questions we give it.


> What is the meaning of a word, if not the relationship it has to other words?

There's also the relationship it has with the world.


Well in my example the AI doesn't have to interact with the world at all. To pass the Turing test simply requires imitating a human, predicting what words they would say. You only need to know the relationships between words.


"Is it day or night?"

If literally the only thing you know is the relationship between words, but you have a perfect knowledge of the relationship between words, you'll quickly determine that "Day" and "Night" are both acceptable answers, and have no means of determining which is the right one. At the very minimum, you need a clock, and an understanding of the temporal nature of your training set, to get the right one.


A beautiful rainbow glimmering gently in the sky after a summer shower.

What do you see? What do you smell? What do you hear? What does the landscape look like? What memory does this bring up?

These are messages that the language is communicating. If an AI can't understand at least some of the content of the message then can it compose one effectively? I'm not certain it can understand the meaning from words alone, but we can certainly try.


Only knowing the relationships between words alone would just be a poor proxy for knowing the meanings of the words, e.g. what real world concepts the words attempt to represent. You might be able to get pretty far with this technique, but I would bet a lot of money you would not be able to get reliable, in-depth human level communication. The system needs to have an understanding of the world.

And then there is the fundamentally dynamic aspect of language, which strengthens the need for a rich understanding of the world that words describe and convey.


There are other tests for AI besides the Turing Test, some of which require more understanding on the part of the program. Check out Winograd Schemas: http://www.cs.nyu.edu/faculty/davise/papers/WinogradSchemas/... which hinge on understanding the subtleties of how words refer to the world.


But it and methods like it are still very limited in what they can learn. For example, they can't learn relations involving antonyms. They can't tell apart hot from cold or big from small.


In fact, it only learns relationships that are a linear combination of "similar to X" and "not similar to Y".


> All of the information of our world is contained in text.

That's a bit overstated. I think maybe you mean to say "all of the information you'd need to be an effective citizen of our world is contained in text"? Or something similar? I think even that is too strong a claim, but it's at least understandable.

As stated the assertion doesn't make any sense. There is more information in a glass of milk than could be stored on all of the computers on earth, and Heisenberg showed that it's impossible to even record all of the information about a single particle.

Certainly there is no textual information available about my grandfather's eyes, but that information is accessible in the world for those who'll look. You seem to be underestimating the quantity of information you absorbed as a baby, just by reacting robotically to the people and events around you and absorbing the relationships between percepts and feelings.


Well of course that's what I meant. Any information an average person knows is contained in text, somewhere. That is all common sense knowledge. Everything from detailed descriptions of trees, to the color of the sky, to the shape of the human face, etc. But also much more, like all of our scientific knowledge and written history. Billions of things the average person doesn't know.


Learning the statistics of language is not going to tell the ML model anything about the underlying stuff to which the language actually refers. It will need actual "sense-data" to do that. For instance, to get a model that generates image captions, you need to train it with actual images.

If the much-vaunted "general intelligence" consists in both vague and precise causal reasoning and optimal control with respect to objects in the real world, then no, it obviously cannot be done with mere language. At least one sensor and one effector will be needed to train an ML/AI model to perform those tasks.


There is nothing magical about "sense data". A video is just a bunch of 1's and 0's, just like text data. A model of video data is not superior in any way to one of text data, they are just different.

The internet is so large and so comprehensive (especially if you include digitized books and papers, e.g. libgen or google books) that I doubt any important information that can be learned through video data, can't be obtained through text data.


>A video is just a bunch of 1's and 0's, just like text data. A model of video data is not superior in any way to one of text data, they are just different.

Uhhhh... there's this thing called entropy. A corpus of video data is vastly higher in entropy content than text data, and probably a good deal more tightly correlated too, making it much easier to learn from.

Remember, for a human, the whole point of speech and text is to function as a robust, efficient code (in the information-theoretic sense) for activating generative models we already have in our heads when we acquire the words. This is why we have such a hard time with very abstract concepts like mathematical ones: the causal-role concepts (easy to theorize how to encode those as generative models) are difficult to acquire from nothing but the usage statistics of the words and symbols, in contrast to "concrete", sense-grounded concepts, which have large amounts of high-dimensional data to fuel the Blessing of Abstraction.

Nevermind, I should probably just get someone to let me into a PhD program so I can publish this stuff. If only they'd consider the first paper novel enough already!


I think you mean redundancy. And yes videos are highly redundant. But I don't see how that's any kind of advantage. Text has all the relevant information contained within it, with a ton of irrelevant information discarded. But there are still plenty of learnable patterns. Even trivial algorithms like word2vec can glean a huge amount of semantic information (much easier than is possible with video, currently.)

I don't know if humans have generative models in their heads. There are people who have no ability to form mental images, and they function fine. Regardless, an AI should be able to get around that by learning our common patterns. It need not mimic our internal states, only our external behavior.


>I think you mean redundancy.

No, I meant a pair of specific information-theoretical quantities I've been studying.

>And yes videos are highly redundant. But I don't see how that's any kind of advantage.

Representations are easier to learn for highly-correlated data. Paper forthcoming, but conceptually so obvious that quantifying it is (apparently) non-novel.

>I don't know if humans have generative models in their heads.

The best available neuroscience and computational cognitive science says we do.

>Regardless, an AI should be able to get around that by learning our common patterns. It need not mimic our internal states, only our external behavior.

Our external behavior is determined by the internal states, insofar as those internal states are functions which map sensory (including proprioceptive and interoceptive) statistics to distributions over actions. If you want your robots to function in society, at least well enough to take it over and kill everyone, they need a good sense of context and good representations for structured information, behavior, and goals. Further, most of the context to our external behavior is nonverbal. You know how it's difficult to detect sarcasm over the internet? That's because you're trying to serialize a tightly correlated high-dimensional data-stream into a much lower-dimensional representation, and losing some of the variance (thus, some of the information content) along the way. Humans know enough about natural speech that we can usually, mostly reconstruct the intended meaning, but even then, we've had to develop a separate art of good writing to make our written representations conducive to easy reconstruction of actual speech.

Deep learning can't do this stuff right now. Objectively speaking, it's kinda primitive, actually. OTOH, realizing the implications of our best theories about neuroscience and information theory as good computational theories for cognitive science and ML/AI is going to take a while!


>Representations are easier to learn for highly-correlated data. Paper forthcoming, but conceptually so obvious that quantifying it is (apparently) non-novel.

I know what you are saying, but I don't think it's true.

Imagine a hypothetical language that is so compressed, so non-redundant, so little correlated, that it's indistinguishable from random noise. Learning this language may seem an impossible task. But in fact it's very easy to produce text in this language. Just produce random noise! As stated, that's indistinguishable from real text in this language.

Real language, of course, has tons of statistical patterns, and is definitely not random. But I don't see how it is harder to learn than, say more redundant audio recording of the same words, or a video recording of the person speaking them. That extra information is irrelevant and will just be discarded by any smart algorithm anyway.

>The best available neuroscience and computational cognitive science says we do.

Most people do. As I said some people don't, and they function fine. See this: http://www.bbc.com/news/health-34039054

>Our external behavior is determined by the internal states, insofar as those internal states are functions which map sensory (including proprioceptive and interoceptive) statistics to distributions over actions. If you want your robots to function in society, at least well enough to take it over and kill everyone, they need a good sense of context and good representations for structured information, behavior, and goals.

Robots are never going to have exactly the same internal states and experience as humans. They could be very, very different, in structure, to the human brain. Being exactly like humans isn't the goal. Mimicking humans is an interesting diversion, but it's not necessary, or the goal in and of itself.

And you may be right that a robot without vision would be disadvantaged. I think that's mostly anthropomorphism, imagining how disadvantaged blind humans are (and in fact even blind humans can function better than most people expect.) But even if it's true, my point is that sight is not strictly necessary for intelligence.

In fact I think vision may even be a disadvantage. So much of the brain is devoted to visual processing. While text, and even language itself, are hacks that evolution created relatively recently. A brain built purely for language could be much more efficient at it than we can probably imagine. Ditching vision could save a huge amount of processing power and space.


>I know what you are saying, but I don't think it's true.

Reading your post, you actually seem quite confused.

>Imagine a hypothetical language that is so compressed, so non-redundant, so little correlated, that it's indistinguishable from random noise. Learning this language may seem an impossible task.

Well yes, learning a class of strings in which each digit of every finite prefix is statistically independent from each other digit, is very hard, bordering on impossible (or at least, impossible to do better than uniform-random guessing).

>But in fact it's very easy to produce text in this language. Just produce random noise!

But that isn't the learning problem being posed! You are not being asked to learn `P(string | language)` (which is, in fact, the uniform distribution over arbitrary-length strings), but `P(language | string1, string2, ..., stringn)`, which by the way you've posed the problem factorizes into `P(language| character1) x P(language|character2) x ... x P(language|characterm)`. If the actual strings are sampled from a uniform distribution over arbitrary-length strings, then we have two possibilities:

1) The prior is over a class of languages some of which are not optimally compressed, and which thus do not render each character (or even each string) conditionally independent. In this case, the posterior will favor languages that do render each character conditionally independent, but we won't be able to tell apart one such hypothesis from another. We've learned very little.

2) The prior is over a class of languages all of which yield strings full of conditionally-independent noise: no hypothesis can compress the data. In this case, the evidence-probability and the likelihood cancel, and our posterior over languages equals our prior (we've learned nothing).

>Real language, of course, has tons of statistical patterns, and is definitely not random. But I don't see how it is harder to learn than, say more redundant audio recording of the same words, or a video recording of the person speaking them. That extra information is irrelevant and will just be discarded by any smart algorithm anyway.

Noooo. Compression does not work that way. Compression works by finding informative patterns in data, not by throwing them away. If your goal is to learn the structure in the data, you want the structure to be more redundant rather than less.

I'm telling you, once the paper is submitted, I can send you a copy and just show you the equations and inequalities demonstrating this fact.

>Most people do. As I said some people don't, and they function fine. See this: http://www.bbc.com/news/health-34039054

Differences in sensorimotor cortex function that leave the brain unable to perform top-down offline simulation with a high subjective-sensory precision don't invalidate the broad theory that cortical microcircuits are generative models (in particular, hierarchical ones, possibly just large hierarchies in which the individual nodes are very simple distributions).

http://www.fil.ion.ucl.ac.uk/~karl/The%20free-energy%20princ...

>Robots are never going to have exactly the same internal states and experience as humans. They could be very, very different, in structure, to the human brain.

Duh. However, if we want them to work, they probably have to run on free-energy minimization somehow. There is more necessity at work here than connectionism believes in, but that's a fault in connectionism.

>Being exactly like humans isn't the goal. Mimicking humans is an interesting diversion, but it's not necessary, or the goal in and of itself.

I didn't say that a working robot's representations had to exactly match those of humans. In fact, doing so would be downright inefficient, since robots would have completely different embodiments to work with, and thus be posed different inference problems in both perception and action. The fact that they would be, necessarily, inference problems is the shared fact.

>And you may be right that a robot without vision would be disadvantaged. I think that's mostly anthropomorphism, imagining how disadvantaged blind humans are (and in fact even blind humans can function better than most people expect.) But even if it's true, my point is that sight is not strictly necessary for intelligence.

Sight isn't. Some kind of high-dimensional sense-data is.

>In fact I think vision may even be a disadvantage. So much of the brain is devoted to visual processing. While text, and even language itself, are hacks that evolution created relatively recently. A brain built purely for language could be much more efficient at it than we can probably imagine. Ditching vision could save a huge amount of processing power and space.

That's putting the cart before the horse. Language is, again, an efficient but redundant (ie: robust against noise) code for the models (ie: knowledge, intuitive theories, as you like) the brain already wields. You can take the linguistic usage statistics of a word and construct a causal-role concept for them in the absence of a verbal definition or sensory grounding for the word, which is arguably what children do when we read a word before anyone has taught it to us, but doing so will only work well when the concepts' definitions are themselves mostly ungrounded and abstract.

So purely linguistic processing would work fairly well for, say, some of mathematics, but not so much for more empirical fields like social interaction, ballistic-missile targeting, and the proper phrasing of demands made to world leaders in exchange for not blowing up the human race.


>But that isn't the learning problem being posed! You are not being asked to learn `P(string | language)` (which is, in fact, the uniform distribution over arbitrary-length strings), but `P(language | string1, string2, ..., stringn)`...

Hold on. Let's say the goal is passing a Turing test. I think that's sufficient to demonstrate general intelligence and do useful work. In that case, all that is required is mimicry. All you need to know is P(string), and you can produce text indistinguishable from a human.

>Noooo. Compression does not work that way. Compression works by finding informative patterns in data, not by throwing them away. If your goal is to learn the structure in the data, you want the structure to be more redundant rather than less.

Ok lets say I convert English words to smaller huffman codes. This should be even easier for a neural network to learn, because it can spend less effort trying to figure out spelling. Of course some encodings might make it harder for a neural net to learn, since NNs make some assumptions about how the input should be structured, but in theory it doesn't matter.

>Some kind of high-dimensional sense-data is [necessary]... purely linguistic processing would work fairly well for, say, some of mathematics, but not so much for more empirical fields

These are some really strong assertions that I just don't buy, and I don't think you've backed up at all.

Humans have produced more than enough language for a sufficiently smart algorithm to construct a world model from it. Any fact you can imagine is contained somewhere in the vast corpus of all English text. English contains a huge amount of patterns that give massive hints to the meaning. E.g. that kings are male, or that males shave their face and females typically don't, or that cars are associated with roads, which is a type of transportation, etc.

Even very crude models can learn these things. Even very crude models can produce nearly sensible dialogue from movie scripts. Models with millions of times fewer nodes than the human brain. It's amazing this is possible at all. Of course a full AGI should be able to do a thousand times better and completely understand English.

Trying to model video data first is wasted processing power. It's setting the field back. Really smart researchers spend so much time eeking out 0.01% better benchmark on MNIST/imagenet/whatever, with entirely domain specific, non general methods. So much effort is put into machine vision, when Language is so much more interesting and useful, and closer to general intelligence. Convnets, et al., are a dead end, at least for AGI.


>Ok lets say I convert English words to smaller huffman codes. This should be even easier for a neural network to learn, because it can spend less effort trying to figure out spelling.

I'd need to see the math for this: how will the Huffman codes preserve a semantic bijection with the original English while throwing out the spellings as noise? It seems like if you're throwing out information, rather than moving it into prior knowledge (bias-variance tradeoff, remember?), you shouldn't be able to biject your learned representation to the original input.

Also, spelling isn't all noise. It's also morphology, verb conjugation, etc.

>Humans have produced more than enough language for a sufficiently smart algorithm to construct a world model from it.

Then why haven't you done it?

>Any fact you can imagine is contained somewhere in the vast corpus of all English text.

Well no. Almost any known fact I can imagine, plus vast reams of utter bullshit, can be reconstructed by coupling some body of text somewhere to some human brain in the world. When you start trying to take the human (especially the human's five exteroceptive senses and continuum of emotions and such) out of the picture, you're chucking out much of the available information.

There's damn well a reason children have to learn to speak, understand, read, and write, and then have to turn those abilities into useful compounded learning in school -- rather than just deducing the world from language.

>Even very crude models can learn these things. Even very crude models can produce nearly sensible dialogue from movie scripts.

Which doesn't do a damn thing to teach the models how to shave, how to tell kings from queens by sight, or how to avoid getting hit by a car when crossing the street.

>Models with millions of times fewer nodes than the human brain. It's amazing this is possible at all.

The number of nodes isn't the important thing in the first place! It's what they do that's actually important, and by that standard, today's neural nets are primitive as hell:

* Still utterly reliant on supervised learning and gradient descent.

* Still subject to vanishing gradient problems when we try to make them larger without imposing very tight regularizations/very informed priors (ie: convolutional layers instead of fully-connected ones).

* Still can't reason about compositional, productive representations.

* Still can't represent causality or counterfactual reasoning well or at all.

>Trying to model video data first is wasted processing power. It's setting the field back. Really smart researchers spend so much time eeking out 0.01% better benchmark on MNIST/imagenet/whatever, with entirely domain specific, non general methods. So much effort is put into machine vision, when Language is so much more interesting and useful, and closer to general intelligence. Convnets, et al., are a dead end, at least for AGI.

Well, what do you expect to happen when people believe in "full AGI" far more than they believe in basic statistics or neuroscience?


I don't think we are getting anywhere. Look, can you point to any instance of machine vision being used to improve a language model of English? Especially any case where the language model took more computing power to train than the model aided with vision?

I don't think anything like that exists today, or ever will exist. And in fact you are making an even stronger claim than that. Not just that vision will be helpful, but absolutely necessary.


A video is encoded to a series of ones and zeros by a codex. That codex determines the interpretation of that data. The codex basically becomes a sensor -- it serves the same purpose as the eyes, which is taking raw data (photon excitements or binary string) and turning it into meaningful data (an image). And without that codex, the information is basically meaningless.


If we ever want 'true AI' then interacting with the world would certainly be a crucial component of building one. Text is a crucial component, but ultimately being able to 'see' gives one an entirely new perspective on what words mean. There's a reason after all, for why toddlers touch hot surfaces even after repeatedly being told that it will hurt - until they experience it for themselves, the word 'hot' doesn't quite connect even if they heard it dozens of times and have a very good abstract sense of what it means.


I agree that learning to reason about the world likely does not require experience with motor control and proprioception (i.e. literally how babies do it), though I do think that you either need at least some sort of tempo-spatial experience (e.g. visual). Tempo-spatial representations are just extremely hard to convey by text only. You might get the idea of closeness by saying 'close is when two words are close in a sequence of words' and ordering by saying 'this word comes after that word', but I think it would be very difficult to extrapolate that concept to more than one dimension, and dimensions that actually have not just an ordering but also a metric (just think about our inability to reason about just the fourth dimension). You need rich representations of our 3+1 dimensional world to be able to reason about it and text only gives you perhaps "0.5" dimensions (because it lacks a metric, i.e. it does not convey durations in terms of the ticks of the recurrent network). But I doubt, too, that interaction with the world is necessary, in fact I think it would be rather easy for an AI to simply write motor programs in a programming language given unrestricted and noiseless memory once it has learned to reason about tempo-spatial patterns from just observing them and identifying them with our language-coded shared concept space. It is not constrained to real-time performance of actions as humans are, therefore it can take the much easier way of programming any interaction with the world as needed, on the fly. Our shared concept space likely sufficiently conveys our general (common sense) knowledge how these patterns are known to interact and evolve over time once rudimentary tempo-spatial representations are in place.


tl;dr I think you need a small set of tempo-spatially grounded meanings (though not necessarily agent-related) and you can bootstrap everything from that using only textual knowledge.


I agree that AI is possible from text alone but only with the added stipulation that full understanding of text requires a very sophisticated learner. In order to predict what a human will say next, you need the ability to maintain extended context, use world models plus deduction to narrow and maintain multiple possibilities; all while being able to infer the state of the thing you are trying to predict (which means the AI itself has complex internal state far beyond anything an RNN or Neural Turing machine could manage today).

If I said "That is one huge fan", could you predict what my next word will be? It would depend a lot on context and the ability to reason within a complex world model. Depending on whether I had gone to a concert or to a wind tunnel, your distribution over guesses would alter. If I had gone whale watching you might even suspect I made a typo. Changing huge to large would lead to major to no adjustments, depending on each guess.

So while I agree an AI could emerge from text alone, it would have to be very sophisticated to do this.


> Look at word2vec. By using simple dimensionality reduction on the frequency that words that occur near to each other in news articles, it can learn really interesting things about the meaning of words

doesn't that depend on the syntax of a language? my guess is that it would not work for a language that has a more flexible word order than English; so it wouldn't quite work for such languages as Russian, Turkish and Finnish (and other languages).


I think if AGI were possible from the basic statistical NLP techniques outlined in most advanced NLP textbooks, it would have already happened a decade ago.


I'm not saying it is possible from just basic statistical NLP techniques. It may take much more advanced techniques. And it may take much more computing power than we have even now.

But I do believe it is possible, someday. Probably within our lifetime.


I certainly think AGI is possible. I just don't think Word2vec, RNN's, i.e. stuff from the NLP textbooks, is in the same ballpark as what it will take to achieve.

Edit - To be more clear I also agree AGI could be possible with text as the only input. I just think we need a new paradigm. Ostensibly AGI is meant to mimic human intelligence (minus the pitfalls), so IMO, the best approach will be that which mimics the underlying processes of human intelligence - not just the results. Traditional statistical NLP methods will probably have some role in this final system, but not the heart of it, as far as mimicking intelligence by mimicking intelligence's underlying processes goes.


> It's learning the meaning of words, and the relationships between them.

It's learning a meaning, not the meaning. It's just a probabilistic model for the occurrence of a word based on the words that surround it. This should not serve as a base for the rest of your claims.

Anyway -- the improvements gained by multi-modal systems essentially disprove your thesis. Which is a good news! We're making progress.


The problem seems to be in understanding the overall meaning over long periods of time. We can understand general structure quite easily and a single sentence might make sense, but overall it never ends up cohesive or meaningful.

Maybe we can do that with just words, but humans certainly don't. Words are related to concepts first, then we figure out the meaning of the rest.


I can't say anything about the feasibility of what you describe, but something about the idea of being a being composed of pure text, no ability to perceive or visualise the world around me except through the medium of text, utterly horrifies me


So a related question becomes, can you learn to understand and thus predict physics (the way a child does - I'm not talking about quantum mechanics) from literature only, without interacting in space?


Yes, the child doesn't learn physics he or she learns motor control of its body. Baby learning to talk is more about the brain learning to control its body through throwing its neurological system.

Look at animal kingdom some animals are walking about 5 minutes of being born and running in hours.


> from literature only

How about when you include multimedia recordings? Or give the machine a camera and wheels?


Giving the machine a camera and wheels is basically embedding it in the real world. It goes against the spirit of my question, and since I was asking it skeptically it's actually the thing I intuitively (without any relevant expertise!) expect to be required.


> locking a baby in a featureless room

Strangely (or maybe horribly?) there are a number of studies of children raised in Romanian orphanages that somewhat cover this area.

Under Nicolae Ceaușescu the government outlawed abortion (with some exceptions) in an attempt to increase the birth rate (https://en.wikipedia.org/wiki/Abortion_in_Romania). Coupled with a poor economy this lead to masses of infants and children being given over to government "care" in orphanages. These were pretty bleak places for infants and children, with infants often spending hours in a crib with little stimulation.

Here's a really good article about the effects this sort of institutional "care" has on children: http://www.americanscientist.org/issues/feature/2009/3/the-d...

Here's the The Bucharest Early Intervention Project (tons of info about this subject area): http://www.bucharestearlyinterventionproject.org/


It's probably a mistake to assume that just because it's the way we do it that it has to be the way machines do it. Although that's usually the initial assumption. In the early days of flight most attempts were based on birds, similarly submersible vehicles were based on fish. We know now it's better to use propellers. It could be we just haven't found what is analogous to a propeller for the AI world.


The only general intelligence we know of is us. It stands to reason that the first step towards creating AGI is to copy the one machine we know is capable of that type of processing. Why doesn't our research focus on understanding and copying biological brains? Numenta did, with good results, but it isn't an industry trend.


The steps to learning to make aircraft didn't come from understanding how birds flap their wings. I can't imagine the kind of intelligence we consider general will be from study of how humans biologically think.


> Why doesn't our research focus on understanding and copying biological brains?

There's lots of basic research being done to better understand the biological brain. Progress is slow and steady, but the brain remains poorly understood at this point in time. Most applied research has pursued more pragmatic methods because these methods have had faster progress and proven more useful in practice.


I agree that machines won't necessarily have to do everything the same way we do things to be "intelligent". On the other hand, the concept of artificial general intelligence implies a machine that can perform all of the same functions a human can: "Artificial general intelligence (AGI) is the intelligence of a (hypothetical) machine that could successfully perform any intellectual task that a human being can."[0]

And if you look at what humans do that we characterize as "intelligence", it's a substantial list of different functions: "Human intelligence is the intellectual capacity of humans, which is characterized by perception, consciousness, self-awareness, and volition. Through their intelligence, humans possess the cognitive abilities to learn, form concepts, understand, apply logic, and reason, including the capacities to recognize patterns, comprehend ideas, plan, problem solve, make decisions, retain information, and use language to communicate. Intelligence enables humans to experience and think."[1] If I take an engineering perspective and look at the list of functions that an AGI would have to perform to be an AGI, I would definitely want to give it more than a text-in, text-out interface so that it could have, for example, perception and volition.

For example, imagine a theoretical "AGI" that exclusively deals with a stream of text in (as human language) and a stream of text out (again, as human language). If you ask it any questions about its physical surroundings, it's either going to make things up (which isn't perception and therefore fails both at being useful and meeting the definition of intelligence above), or it's going to get information about its physical surroundings via some proxy that feeds it a linguistic stream of information about its surroundings. But it doesn't matter if it's getting perceptual information from a proxy or from more directly embedded sensory interfaces; if it's getting perceptual information and able to use it sensibly, then it's performing the function of perception. And in that case the proxy source of perceptual information may as well be considered part of the "intelligent" system.

These kinds of articles seem a bit silly to me, because they seem to imply that we should expect to be able to create a system which is "intelligent" but which only performs the function of understanding and producing meaningful language. But if you're only handling language, you're taking away all of the other functions that are part of the definition of "intelligence" above, which leaves you with a system which is far short of anything we'd consider "intelligent".

[0] https://en.wikipedia.org/wiki/Artificial_general_intelligenc...

[1] https://en.wikipedia.org/wiki/Human_intelligence


> This isn't an engineering problem, it's a philosophical problem [...]

Indeed. However we can not rule out the possibility that we engineer a system without a "body" (I think you are referring to the Embodied mind thesis?). It is a complicated topic and discussions about it are futile without precise definitions of loaded terms, like "intelligence", "body", etc.

A rather well-defined test is the classic Turing test and I wouldn't dismiss the possibility that it can be passed by a bodyless machine/program-thing.


> It is a complicated topic and discussions about it are futile without precise definitions of loaded terms, like "intelligence", "body", etc.

On the contrary, to make progress we should give up on making precise definitions of non-technical terms such as 'body' and 'intelligence'. They are folk notions that don't have relevant precise definitions talking about them in an engineering context is distracting.

You don't need a definition of 'beauty' to paint. You don't need a definition of 'justice' to practice law. You don't need a definition of 'intelligence' to build clever robots.


I don't find the Turing Test convincing either, because someone smart enough to build it should be smart enough to recognize it from its answers. And if that depends on the intelligence of the questioner, whose intelligence is tested then, really?

IIRC the test is a binary classifier, but intelligence is a spectrum that's fuzzy and therefore inherently hard to define.

IE, how low is someone willing to assume the lack of intelligence in a human is not a good definition of general intelligence, as that's circular reasoning.

I would suppose that babies possess general intelligence, but they lack the knowledge about the environment.


> I don't find the Turing Test convincing either, because someone smart enough to build it should be smart enough to recognize it from its answers.

Why do you assume that? The creators of AlphaGo certainly couldn't beat it.


The Turing Test equivalent in the game of Go is not beating the computer. Instead, it is: given a history of moves, can you determine whether or not a computer was playing? Determining whether or not AlphaGo was playing seems like a relatively easy task to me for the designers. Since they have access to the AlphaGo system, they can just calculate the probability that each move corresponds to one AlphaGo would make.


> Instead, it is: given a history of moves, can you determine whether or not a computer was playing?

No, it's really not. For the Turing Test, the AI is meant to be adversarial—it's objective is to convince you that it is human.

AlphaGo's objective isn't to "play like a human," it is to win. If they gave it an objective of playing like a human, I'm sure AlphaGo could play in a way that would be indistinguishable from a human.

> Since they have access to the AlphaGo system, they can just calculate the probability that each move corresponds to one AlphaGo would make.

Peeking at the system/data is cheating. Obviously the person who sets up a Turing test knows which player is AI.


> If they gave it an objective of playing like a human, I'm sure AlphaGo could play in a way that would be indistinguishable from a human.

It could just play unbelievably bad and appear like a beginner. That wouldn't prove intelligent.

> Peeking at the system/data is cheating

Someone ignorant of computers would hardly ever assume a machine. Of course the omission of this rule would leave someone smarter than the computer.

If you talk statistics, IE the machine has to convince only a fair share of humans, the definition of the threshold is a problem. Intelligence would depend on the development of the society. I thought this is about an intrinsic value.

It's an interesting thought experiment, but hardly conclusive, just observational.


> It could just play unbelievably bad and appear like a beginner. That wouldn't prove intelligent.

Sure, which is why it's not a very good metric. The correct metric for looking at whether computational game intelligence has exceeded human capacity is that computers can consistently beat humans.

To be clear, I'm not convinced that we'll ever make a generalized intelligence which can pass the Turing Test. My point was merely that the fact that humans create the system is not a good argument for why it's impossible: in many domains, we can already create computer systems which vastly outperform ourselves.


I can only speak for chess, but one of the unsolved problems in computer chess is how to build a program that plays human chess of appropriate level.

It is very hard to dial down a ELO 3000+ program to 1800 level of a club player and not make it computer like.

What is usually done is lower the depth searched and add some random blunders but it is still obvious to a stronger player that it is a program.


Good question. Someone beat it. He and his games as training sets were part of the development of AlphaGo development.

I edited the post, did you read that? You are making my point, you can't bootstrap a definition for artificial intelligence by comparison to humans, when human intelligence is not well defined either.


I read your post, but it's very muddled. You might consider advancing a clearer thesis, because it seems that you are under the impression that it's impossible for humans to build systems which are smarter than themselves.

The first versions of AlphaGo were certainly inferior to human players, but the current version is superior to any human.


> because it seems that you are under the impression

I made a hopeful hypothesis and I opposed immediately that human intelligence might just not be optimized for recognizing intelligence. It is optimized for other things, not to waste energy and because of that it recognizes indeed that to play go very well but nothing else is rather less intelligent.

You do make a strong point there, specialized computers are stronger than humans in a specific task, but we are talking about general intelligence. I have to admit, too, that I have a hard time getting the bigger picture and get confused to easily. I didn't read any of the literature that would rather well define the problem, as the OP put it, so the discussion is likely less informative.

In my opinion, the comparison is unequal, still, because the Computer used a ton more resources and memory. There aren't enough go professionals to put together and let their averaged opinion learn and play, consuming as much energy.


Why not simply simulate the body (or indeed multiple bodies) in a virtual world?

Easier to build, parallelize, extend, maintain. Possibly somewhat safer too.


That works, if your virtual world is good.

But that's a big if. You've just taken one really hard problem (learning about the world) and turned it into an even harder problem (simulating the world).


Some games are pretty convincing sand boxes and it's not obvious to me that you really need full range of senses to properly teach the AI.


This is exactly the premise of The Talos Principle :-)


I second this. To further illustrate that semantics is far more than syntax and that emotions are an inseparable part of understanding language, do the following experiment: take any word or concept and look up its definition in a dictionary. Take note of the words used in the definition and look up the meaning of those words. Repeat recursively and if you go far enough down the tree you will find that all words lead to self-recursive definitions that state "it is a feeling of" this or that. This is proof that semantics of language - and intelligence required to grasp it's semantics - is inseparable from subjective states.


What does the experiment prove? Of course dictionaries are circular. But that doesn't mean they don't contain huge amounts of information about the world, and about language. Information an AI could infer without any interaction with the outside world at all.


This means that the semantics of the language is rooted in subjective states. Restated, this means humans "understand" language because of humans' emotions. Computers may "understand" language too, but it surely will not be due to the subjective states as it is with humans. If we define AI as a computer that must "understand" language the same way as humans do, then by definition, AI is not possible.


This is why Turing invented his famous test. At the time people were arguing about what it would take to prove a machine is intelligent. People argued that the internal properties of the machine mattered, that it needed to feel emotions and qualia just like humans.

But Turing argued that if the machine could be shown to do the same tasks that humans can do, and act indistinguishable from a real human, then surely it's intelligent. Its internal properties don't matter, at least not to judge intelligence.


Exactly. The computer will fail such a test where such tests can only be passed if and only if the agent under test experiences subjective states. Since humans can always craft this class of tests, and the computer cannot pass it, it will always fail the "Turing Test".


What test could possibly test for subjective states? You can ask the computer how it feels, and it can just lie, or predict what a human would say if asked the same question. There's no way to know what the computer actually feels, and it doesn't really matter for this purpose.


The easy answer is this: these tests exist. Since no computer put to the turing test has passed, simply look up the test and observe how humans have induced the computer to fail.

In practice, a good class of tests to use is a test that must evoke an emotional response to produce a sensical answer. An example is art interpretation. Questions involving allegory. Interpret a poem etc.

Important to note that whatever the challenge is, it must always be a new example - as in never been seen before. Anything that is already in the existing corpus, the computer can simply look up what is already out there. In other words, there is no one concrete thing you can use again and again repeatedly.

Example of test that would foil a computer: A personally written poem and having discussion about it.


This sort of anthropomorphic intuition pump is counter production. It would be much easier to believe that a baby could do that if we designed the baby's brain from scratch to be able to.


In this vein, have you read about the work done by people mostly from the AI lab at the Vrije Universiteit in Brussels (Belgium)? (They're also affiliated with the Sony CSL in Paris: http://csl.sony.fr/language.php) They're precisely interested in the philosophical problem of how a grounded language emerges and is perpetuated among a population of embodied agents, as opposed to the engineering problem of, say, understanding complex, context-dependent natural language queries.

There's a great book which gives an overview of this field, The Talking Heads Experiment: Origins of Words and Meanings by Luc Steels, which discusses many of the advances made in this field (including for instance how having a grammar, as opposed to just stringing words related to what you want to say at random, is an evolutionary advantage because it boosts communicative success). It's published as open access, so go grab your free copy! :)

http://langsci-press.org/catalog/book/49

Chapter 4 in particular has a very interesting discussion of what's problematic with the machine learning approach -- that it takes a lot of training examples for a classifier to start making interesting decisions -- and presents a selectionist alternative to that, where distinctions (as in e.g. nodes in decision trees) are grown randomly and they're reinforced / pruned based on feedback. Crucially, the categories (semantic distinctions) are not labels given at the outset, but they emerge along with the language, based on the environment the agents encounter and the tasks they're using language for.

In general, I'd recommend Chapters 1 and 2 for a quick introduction, but in a pinch, I attempted to give a 50,000-foot summary in an essay I wrote (look under the heading Evolutionary Linguistics):

http://dlukes.github.io/cathedral-and-bazaar.html

I realize that engineering applications of these ideas might be a long way off (and perhaps they'll never materialize), but boy are these exciting discoveries about the very fabric of language :)


Computers are programmed only using text, even if the text has just two symbols. Sensor interfaces use digital signals, again symbol streams.

That would beget the question if there could be machines mightier than a Turing Complete one. I'm sure that's missing your point.


If you could simply program a human's behavior into a machine, that would be fine, but humans aren't capable of encoding their neural circuitry in a programming language; in fact, the combined efforts of all humans have only begun to shed light on what human behavior is. As such, generating formalized information (that describes behavior) requires some process (other than human hands) to do it -- and in the case of a machine that interacts with the world, "turing machine" is not a complete description of any such process.


By Shannon's information theory, everything is bits of information. Completely irrelevant for the mechanism of a Turing Machine is, who writes the initial bits on the tape. For all I care, the world is the tape and the computer is the head being moved through it. It's the old fallacy seeing the brain as a computer, since we've build computers after structures from our brains. Hence I wondered, what more there should be.


> you end up with something which is not particularly different or better than a human cyborg.

Even if that were true, which I am not sure about, such a robot will have a very different moral status, as an artifact, and one that can be reproduced cheaply and indefinitely. This is very useful indeed, so on this axis it could be counted as 'better'.

The idea of development being important for AI is an old one, but it hasn't had much concrete success. Brook's Cog robot at MIT is a prominent example of a robot that didn't do very much, despite this approach being taken in a good faith effort by talented and well-supported people.


Humans seem to be created cheaply and indefinitely too. Population is growing out of control. We could attach them to pods and harvest energy from their souls!


Here is the reason why I don't believe in that: First of all, it seems that mind space is huge, i.e. there are very different programs that can lead to general intelligence, many of which will be very different from humans (this along is evidenced by the fact how strongly human characters and intellects vary, e.g. highly functioning mental conditions). A lot in machine learning points to that possibility. There are for example many ways to get supervision signals e.g. reconstruction error, prediction error, adversaries, intrinsic motivation (reducing the number of bits required for a representation), compression, sparseness etc.

We basically just need a system that comes up with efficient representations of the world such that it can reason about it, i.e. that it can tell you which hypotheses about the world are likely true given some data. This computation allows you to make predictions and predictions are really at the heart of intelligence. If you can follow a hypothetical trajectory of generated, hallucinated or simulated samples of reality, i.e. samples that likely correspond to what actually happens in the world (and in the agent's own brain), then you can actually perform actions that are targeted at some purpose (e.g. maximizing reward signals). However, there are many sources of data that essentially give you the same information. Whether you create a representation by directly interacting with the world or just watch many examples of how the world generally evolves over time and how different entities interact with one another, you essentially get the same idea about how the world works (except in the first case you also learn a motor system). I think the anthropomorphism is really misplaced here, because computer systems are not dependent on actually performing in the real world. Since computers have near unlimited, noiseless memory and have super fast access to that memory, they can actually plan interactions by careful reasoning on the fly instead of needing to learn e.g. motor skills for manipulating objects, eating and handwriting before one can get anywhere near the performance of computers wrt. access to reliable external memory. A computer system also does not have hormone and neuromodulator levels that need to be met for healthy development (e.g. dopamine), therefore the intuition that deprivation of interaction with the world prevents learning is extremely misleading.


> They need bodies that are biologically connected with the rest of the biosphere

On what basis do you conclude that "biologically" is important here? There may be some reason to suspect that human-like intelligence requires human-like ability to sense and interact with the outside world, but I see less reason to suspect that it is important that the mechanism of the sensors or manipulators must be biological.


Reminds me of a quote I once heard, for which I can't find the source:

The cheapest way to make a brain is still the old way.


In fact it is a very hard engineering problem, which doesn't exclude a "philosophical" one. There's a line of research surrounding anthropomimetic robots with the specific task of studying cognition, going back at least a decade.


Helen Keller's life argues against the proposition that a machine requires the same sort of interaction with its environment that the average human experiences, before it can achieve intelligence.


She was blind and deaf, but she still had enormous amounts of tactile information and the ability to physically interact with her environment. Moreover, those sensory elements were critical for her to finally be able to start learning language--she finally caught on that the signs another person was making in one hand represented the water being run over her other hand. And she didn't make that breakthrough until she was 7 years old, and only then began to learn with the persistent help of an instructor.[0]

I would say that Helen Keller's life argues that a intelligent machine must be able to have experiences and the capacity to associate experiences with language. The machine probably doesn't need all of the perceptual modalities that we have, as Helen Keller demonstrated, but it should probably have some similarities with our own so that there would be common ground for initiating communication about experiences. A machine with just a text in / text out interface has nothing in common with us.

[0] https://en.wikipedia.org/wiki/Helen_Keller#Early_childhood_a...


So Keller's "enormous amounts of tactile information and the ability to physically interact with her environment" means that her achievements were not that remarkable? Speaking personally, if I were to lose my sight and hearing, I imagine I would find life extremely daunting, even though I have what seems to me the very considerable advantage of having learned language.

Anne Sullivan's equally remarkable (IMHO) role as a teacher is not really an issue here, as training is also an option for AI, though it might be evidence in a rather different discussion about whether unsupervised learning alone, particularly as practiced today, is likely to get us to AI (clearly, the evolution of intelligence can be cast as unsupervised learning, but that is a very long and uncertain process...)


Ben Goertzel of OpenCog fame has argued for this for a long time.


Deep learning has succeeded tremendously with perception in domains that tolerate lots of noise (audio/visual). Will those successes continue with perception in domains that are not noisy (language) and inference/control, which the article touches on? I think it really is unclear whether those challenges will require fundamental developments or just more years of incremental improvement. If fundamental developments are needed, then the timeline for progress - which everyone in tech seems to be interested in - becomes much more indeterminate.

If you think about audio/visual data, deep nets make sense: if you tweak a few pixel values in an image, or if you shift every pixel value by some amount, the image will still retain basically the same information. In this context, linearity (weighting values and summing them up) make sense. It's not clear whether this makes sense in language. On the other hand, deep methods are state of the art on most NLP tasks, but their improvement over other methods isn't the huge gap as in computer vision. And while we know there are tight similarities between lower-level visual features in deep nets and the initial layers of the visual cortex, the justification for deep learning in NLP is simpler and less specific: what I see is the fact that networks have huge capacity to fit to data and are deep (rely on a hierarchy of features). My guess is we may need a fundamental breakthrough in a newfangled hierarchical learning system that is better suited for language to “solve” NLP.

I think there are similar limitations with control and inference. When it comes to AlphaGo the deep learning component is responsible for estimating the value of the game state; the planning component is done with older methods. This is much more speculative, but when it comes to the work on Atari games, for example, I suspect that most of what is being learned (and solved) is perception of useful features from the raw game images. I wonder whether the features for deducing game state score are actually complex.

I think what I'm trying to say is that when we look at the success of deep learning, we have to separate out what part of that is due to the fact that deep learning is the go-to blackbox classifier, and what part of this is due to the systems we use actually being a good model for the problem. If the model isn't good, does that model merely need to be tweaked from what we currently use, or does the model have to completely change?


There is evidence that language is fairly smooth though. For example, we can extract e.g. the gender vector from a word embedding space that is learned by a recurrent neural network. That seems to hint at the possibility that words, sentences and concepts live in smooth, high-dimensional manifold that makes them learnable for us in the first place (because in that case they can be learned by small local improvements which seems to be required for biological plausibility). That is also the reason why we have often many words for the same or similar meanings and, conversely, why formal grammars have failed at modeling language.

Arguing from the other direction, neural networks have also already proven to deal with very sharp features. For example the value and policy networks in AlphaGo are able to pick up on subtle changes in the game position. The changes from the placement of single stones can be vast in Go and by no means this is only solved by the Monte Carlo tree search. Without MCTS, AlphaGo still wins in ~80% of the time against the best hand-crafted Go program. The value and policy networks have pretty much evolved a bit of boolean logic, simply from the gradient from the smoothness that results from averaging over a lot of training data.

I have a pet theory that the discovery of sharp features and boolean programs might heavily rely on noise. If the error surface becomes too discrete, we basically need to backup to pure random optimization (i.e. trying any direction by random chance and keep it, if it is better). That allows us to skip down the energy surface even without the presence of a gradient. Of course, such noise can also lead to forgetting, but it just seems that elsewhere the gradient will be non-zero again, so any mistakes will be correct by more learning (or it simply leads to further improvement if the step was into the right direction). Surely, our episodic memory helps in the absence of gradient information as well. If we encounter a complex, previously unknown Go strategy, for example, it will likely not smoothly improve all our Go playing abilities by a small amount. Instead, we store a discrete chaining of states and actions as an episodic memory which allows us to reuse that knowledge simply by recalling it at a later point in time.


> I have a pet theory that the discovery of sharp features and boolean programs might heavily rely on noise. If the error surface becomes too discrete, we basically need to backup to pure random optimization (i.e. trying any direction by random chance and keep it, if it is better). That allows us to skip down the energy surface even without the presence of a gradient.

Isn't that basically Monte Carlo?


It's called random optimization or random search depending on whether you sample from a normal or uniform distribution for the random direction. MC typically refers to any algorithm that computes approximated solutions using random numbers (as opposed to Las Vegas algorithms which use random numbers to always compute the correct solution). So, yes, RO, RS and gradient descent are MC local optimization algorithms.


The very method of using a word embedding space assumes the manifold is smooth, so the fact that vectors extracted from a method that assumes a smooth manifold, are in fact on a smooth manifold, is just circular and not evidence of anything.


The evidence is that this works in the first place.


That is very, very weak evidence.


It is already succeeding on language tasks, see https://research.facebook.com/research/babi/

It is funny how every AI post on HN turns into a speculative discussion forum full of words "I think", "likely", "I suspect", "My guess" etc, when all the research is available for free and everyone is free to download and read it to get a real understanding of what's going on in the field.

>what I see is the fact that networks have huge capacity to fit to data and are deep (rely on a hierarchy of features).

Actually recurrent neural networks like LSTM are turing-complete, i.e. for every halting algorithm it is trivial to implement an RNN that computes it. It is non-trivial to learn these parameters from algorithm IO data, but for many tasks it is possible too.

>I suspect that most of what is being learned (and solved) is perception of useful features from the raw game images.

It is not this simple, deep enough convnets can represent computations, the consensus is that middle and upper layers of convnets represent some useful computation steps. Also note that human brain can only do so much computation steps to answer questions when in dialogue, due to time and speed limits.

>My guess is we may need a fundamental breakthrough in a newfangled hierarchical learning system that is better suited for language to “solve” NLP.

This is being worked on, see the first link for Memory Networks and Stack RNNs, DeQue RNNs, Tree RNNs. Deep learning is a very generic term, there are dozens of various feedforward and recurrent architectures that are fully differentiable. The full potential of such models has not been nearly reached yet and maybe language understanding will be solved in the coming years (again, the first link shows that it is in process of being solved).


I specifically and mindfully added those words because everything is really an open research question. Would you rather I dissembled a false sense of confidence? If anything, you're stating your vague case way over-confidently. Turing-completeness is broad and nonspecific. Doing "some computation" is an obvious statement that doesn't add any information. The human brain does not seem to have time limits when it comes to thinking about what to say, and further we don't understand enough about neuroscience to make statements like that. Like I said, these are all active areas of research; the jury is still out on whether any specific approach will be the breakthrough.

EDIT (reply to below): in general these statements are either vague and nonspecific, or perfectly correct and non-informative, comments that don't have much to do with my original point.


I agree to your points. Your comment is a quality one, I mostly talked about other ones.

>Turing-completeness is quite broad and nonspecific, like I said.

It is, but feedforward models (and almost every Bayesian/statistical model) don't possess it even in theory, while RNNs do.

>Doing "some computation" is an obvious statement that doesn't add any information.

Let me be more specific: currently researchers think that later stages of CNNs do something that is more interpretable as computation than as mere pattern matching. Our world doesn't require 50-level hierarchy, but resnets with 50+ layers do good, looks like because they learn some non-trivial computation.

>the jury is still out on whether any of those RNN approaches will be the needed breakthrough.

Sure, we'll see. Maybe there won't be need in any breakthrough, just incremental improvement of models. And even current models when scaled up to next-gen hardware (see nervana) can surprise us again with their performance.


My skepticism is not about "succeeding" academically in the sense that research groups get better and better scores on Kaggle competitions.

My skepticism is about success in the sense of commercially useful systems that can process language and function "off the leash" of human supervision without the results being dominated by unacceptably bad results.

Look at the XBOX ONE Kinect vs the XBOX 360 Kinect. On paper the newer product is much better than the old product, but neither one is any easier or fun to use than picking up the gamepad. In the current paradigm, researchers can keep putting up better and better numbers without ever crossing the threshold to something anybody can make a living off.


> It is funny how every AI post on HN turns into a speculative discussion forum full of words "I think", "likely", "I suspect", "My guess" etc

This is probably due to the fact that the field is very interesting and has lots of undefined boundaries, so people like to take educated guesses based on the knowledge they might have and on their intuition. Fair enough for this discussion.

> maybe language understanding will be solved in the coming years

maybe? :)

OK, here comes my guess: I think reasoning about and producing computer programs should be easier than reasoning about and producing natural language. So if that's possible (big if), then it should come first. And then maybe the NLP will be solved with the help of code writing computers. Or maybe just by code writing computers, and nobody here has a job anymore :)


> Deep learning has succeeded tremendously with perception in domains that tolerate lots of noise (audio/visual). Will those successes continue with perception in domains that are not noisy (language)

I wonder if it's just a different kind of "noise". Higher level, more structured.

> My guess is we may need a fundamental breakthrough in a newfangled hierarchical learning system that is better suited for language to “solve” NLP.

It seems fairly evident that there are many hierarchies inside the brain, each level working with outputs from lower-level processing units. In a sense, something like AlphaGo is hierarchy-poor - it has a few networks loosely correlated with a decision mechanism.

But the brain probably implements a "networks upon networks" model, that may also include hierarchical loops and other types of feedback.

I think, to have truly human level NLP, we'd have to simulate reasonably close the whole hierarchy of meaning, which in turn is given by the whole hierarchy of neural aggregates.


Language is noisy. People often say things that have little to do with what they mean and context is really important.

EX: "How long do stars last?" Means something very different in a science class than a tabloid headline. Is that tabloid talking divorce or obscurity? Notice how three sentences in I am clarifying last.


Yep. The problem is that it's _so_ noisy, that the encryption, as it were, might be too strong to crack with statistical methods. You might need the key; i.e., something like a human brain.

EDIT: a combination of noise, I should say, and paucity of information.


Well, we also get things wrong all the time. We regularly either ask for further information to decide what they mean, or expect that it's OK to get the interpretation wrong but be corrected.

Asking a computer to solve all the ambiguity in human language perfectly is asking it to solve it far better than any human can.


No, you only need context. Context in the form of knowledge about the place, company and history that the statement is spoken in. Wikipedia will serve well for a lot of that.


Representation of relationships without representation of qualia gives you brittle nonsense - a content-free wireframe of word distributions.

For human-level NLP, you need to model the mechanism by which the relationship network is generated, and ground it in a set of experiences - or some digital analogue of experiences.

Naive statistical methods are not a good way to approach that problem.

So no, Wikipedia will not provide enough context, for all kinds of reasons - not least of which is the fact that human communications include multiple layers of meaning, some of which are contradictory, while others are metaphorical, and all of the above can rely on unstated implication.

Vector arithmetic is not a useful model for that level of verbal reasoning.


But that's the thing with AI. We make the context. In the case of AlphaGo, IBM's Watson, Self driving cars, we set the goal. There are different heuristics, but we always need to define what is "right" or what the "goal" is.

For AI to determine their own goals, well now you get into awareness ... consciousness. At a fundamental mathematical level, we still have no idea how these work.

We can see electrical signals in the brain using tools and know it's a combination of chemicals and pulses that somehow make us do what we do ... but we are still a long way from understanding how that process really works.


> we still have no idea how these work.

I'd actually just say that we've not really defined these very well, and so arguing about how far along the path we are to them isn't that productive.


Sorry, I've edited my original comment to be clearer. What I really meant is that there is wide tolerance of noise in those domains. "How long does stars last" has a completely different meaning than "How long do stars last" - not tolerant of noise.


If an 6th grader asks their science teacher "How long does stars last?" / "How long stars last?" /"How long do stars last?" / "How old do stars get?" / "Stars, how old can they get?" / ...

In similar context they probably end up parsed to the same question assuming correct inflection, posture, etc. Spoken conversations are messy, but they also have redundancy and pseudo checksum's. Written language tends to be more formal because it's a much narrower channel and you don't get as much feedback.

PS: It's also really common for someone to ask a question when they don't have enough context to understand what question they should be asking.


All those sentences sound (mentally) a lot differently. Some of those sentences give you the impression the speaker is an idiot, for example.


I'd suggest "How long do these stars last?" and "How long do these stairs last?" might be a better example. Human language has more redundancy than computer languages and in a real context it would probably still be clear what was meant even if the wrong word was used, but it's still a much spikier landscape with regard to small changes than images are.


I think you're dead on. And I'm nervous about a coming winter, because of disappointment in all the wolf-crying we're doing about how good at Natural Language we're getting. When we've barely scratched the surface. This latest bot fad worries me.

A further comment on deep methods being state of the art currently:

I wonder how well these tasks really measure progress in natural language understanding (I really don't like isolating that term as some distinct subdiscipline of broader AI goals, but so be it). Some of Chris Manning's students[1] have at least started down the path of examining some of these new-traditional tasks in language, and found that perhaps they are not so hard as they claim to be.

---

[1] A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task. Chen, Bolton & Manning [https://arxiv.org/abs/1606.02858]


Chatbots aren't about nlp, IMHO, they are about easing non-technical people into a top level cli for everything. IMHO, they ultimately have as much in common with search as with a natural language interaction.

IME as a chatbot developer, people don't talk to them in conversational english so much as spit out what they want it to do.


I don't think there will be a winter. There are enough successes in computer vision.


Yeah, absolutely, and those successes are not going anywhere. But as solutions to those problems become more and more rote, funding will still be needed for the bigger problem, which continues to fail to deliver on its promises.


But there isn't a model ( that I am aware of, and I've done some serious checking because reasons :) ) goes beyond say, Chomsky saying "we dunno". The difference between human language capability and our nearest evolutionary neighbors is profound, and we appeal to some emergent phenomenon.

But something about the very use of hierarchy in trying to solve NLP makes me queasy. I think it's more (poetically-metaphorically) like Reed-Solomon codes than hierarchies ( to the extent that those don't actually overlap ). There is Unexplained Conservation of Information That Really Isn't There To Start With.


AI researchers overestimate the role of language in the development of human-like intelligence understood as a common sense heuristic physics that is built empirically through non-linguistic experiences (e.g. if I turn that glass, the wine spills and leaves dirty stain on the couch what has further consequences). The one that is most difficult to implement/reproduce/emulate by machine.

Language is only a communication protocol most efficient in an interactive context (dialog) that allows two agents with shared but not identical set of experiences achieve understanding in some domain and context with the caveat that understanding is unprovable and not absolute. Understanding that is only empirically tested and behavior probed (e.g. long after successful conversation Agent Alice discovers that Agent Bob "did not get it" as she expected).

By analyzing sole language without experiences, the machine, using something like word2vec, may discover semantic dependencies (e.g. man + cassock = pedophile) but not true semantics that has world consequences.

Even with unlimited language corpora the machine does not have the set of axioms that humans have (experiences and observed stories). These axioms are needed to build further more abstract knowledge.


It seems to me that a full mastery of language requires a grasp of semantics, that is the ability to understand what a sentence means. I doubt it's possible to do that without having basic common sense along with an overall representation of the world, and that looks very close to strong AI, imho.

So I'm not surprised computers keep on struggling with language applications. Once they succeed strong AI will not be much further away.


I think the 'overall representation of the world' requirement is pretty key here. Language in AI is often treated as its own class of problem, with the assumption that there is somehow enough signal in the raw mess of examples provided to any given learning system (usually just plain text, stripped of any prosody, emotion, cultural context, imagery; any of the other modalities of communication available to a demonstrably functioning natural language understander[1]) to build a model that 'understands' general use language. I simply don't see how this is possible[2]. I know the classical philosophies about the complementary nature of language and intelligence are out of fashion right now[3], but I'm not quite convinced they deserve to be.

I'll raise your bet; I'm willing to believe that once we succeed in building a general understanding of language, we'll look back and see that we simultaneously have solved Strong AI. To twist the old saying, I think that language is what the human brain does.

---

[1] Yes, we can talk about P-zombies if you want. But I mean more in the Turing Test sense here.

[2] Yes, I know the progress has been impressive. The progress in the 60s with GOFAI was impressive at first too. Then it plateaued.

[3] I'm particularly referring to Sapir-Whorfishm and the various communication heuristics proposed by Grice. But I'd throw Chomskian Universal Grammar in there too.


Grounding language in other sense modalities (multimodal learning) is a thing. We can even generate captions from images and generate images from captions, albeit, not perfectly.

Another grounding source is related to ontologies. We are already building huge maps of facts about the world like "object1 relation object2".

Another source of "common sense" is word embeddings. In fact it is possible to embed all kinds of things, like, shopping bags, music preferences, networks topologies - as long as we can observe objects in context.

Then there is unsupervised learning from video and images. For example, starting from pictures, cut them in a 3x3 grid, shuffle the tiles and then task the network to recover the original layout. This automatically extract semantic information from images unsupervised. A variant is to take slides from video, shuffle them around, then task the network to recover the original temporal order. Using this process we can cheaply learn about the world and provide this knowledge as "common sense" for NLP tasks.

I am not worried about grounding language. We will get there soon enough, but we're just impatient. Life evolved over billions of years, AI is just emerging now. Imagine how much computing power is in the collected brains of humanity, and how much computer time we give AI to learn. AI is starved of raw computing power and experience yet. Human brains would have done much worse with the same amount of computing.


image caption is a separate, albeit related problem to what I'm talking about.

Ontologies are much the same; they are interesting for the problems they solve, but it's not clear how well those problems relate to the more general problem of language.

word embeddings are also quite interesting, but again, are typically based entirely off whatever emergent semantics can be gleaned from the structure of documents. It's not clear to me that this is anymore than superficial understanding. Not that they aren't very cool and powerful. Distributional semantics is a powerful tool for measuring certain characteristics of language. I'm not sure how much more useful it will be in the future.

Uunsupervised learning from video and images is a strictly different problem that seems to me to be much lower down the hierarchy of AI Hardness. More like a fundamental task that is solvable in its own universe, without requiring complete integration of multiple other universes. Whether the information extracted by these existing technologies is actually usefully semantic in nature remains to be seen.

I agree that we'll get there, somewhat inevitably; not trying to argue for any Searlian dualistic separation between what Machines can do and what Biology can do. I'm personally interested in the 'how'. Emergent Strong AI is the most boring scenario I can imagine; I want to understand the mechanisms at play. It may just be that we need to tie together everything you've listed and more, throw enough data at it, and wait for something approximating intelligence to grow out of it. We can also take the more top-down route, and treat this as a problem in developmental psychology. Are there better ways to learn than just throwing trillions of examples at something until it hits that eureka moment?


I think the key ingredient is to be reinforcement learning, and more importantly, agents being embedded in the external world.

Regarding the "internal world", we already see the development of AI mechanisms for attention, short term memory (references to concepts recently used), episodic memory (autobiographic) and semantic memory (ontologies).


>>I think that language is what the human brain does.

I think language is a UI with our own brain. It allows us to interact with its knowledge system and representation of the world. Self is a thin client running on the vast knowledge system. If you think about it, thinking is not where the real thinking happens. We get intuition signals from the brain on what is true / false , which are required for our higher level thinking. So thinking we do is also a thin client running on top of the Brain OS. Both the thinking and language are serialization tools of representation of the world that was solely evolved for communication with other brains. Since we don't have direct neural link with other brains, we have to serialize it and hence language based thinking.

So i think to evolve language understanding in machines, we might have to simulate many intelligent agents in a simulated environment and let them collaborate. Similar to how our brains collaborated and gave rise to natural languages.


I am inclined to agree with you, but then I remember that people used to say the same thing about chess. Perhaps completely solving language requires strong AI, but maybe we can get 99% there with something like the "chinese room", an AI that works like a well learned parrot.


To grasp semantics of human language I would think that AI would have to an understanding of the world from a human experience. So we would need to simulate human experience for an AI. Anyone know of any work on this?


Yup, these type of problems have been named AI-Complete ( https://en.wikipedia.org/wiki/AI-complete )


IMHO It is unproductive to rigidly split problems into "merely requiring algorithmic solution" and "AI-complete".

Even with language there is a whole spectrum of language skill. Some animals like parrots, crows, great apes, and then people can learn language at various levels.

Some deep learning models can already learn basic language skills too. The question is, how far can these techniques go. Maybe, pretty far.


I have to agree to an extent with researchers Tenenbaum and Li. It seems to me that the only way AI is going to learn language is to have some worldly experience to link words to their ultimate semantics.

I don't think AI will be able to fully grasp the intricacies of human language until it has "lived" long enough to form the links between ideas and experiences. Mainly in the physical realm, as obviously a lot of our human development is shaped by our environments. They will need eyes, ears, and maybe noses. We should also consider giving them subconscious or instinctive reactions to certain stimuli. An AI wouldn't immediately know that, for example, rotten meat is bad to humans because it lacks a nose to send a signal of danger and disgust.

We should also consider the idea to communicate with AI in regular face-to-face speech. Talking is not the same as writing, and conveys a lot of information beyond just the words.


This is a long metaphor but work with me here...

There are many types of application programmers, but there are 2 types in particular that are interesting. One of them is the purely technology-driven developer. He uses all the new tools, he's read Knuth's books a hundred times, he knows how to build elegant systems. However, he only takes enough interest in the business as is necessary to know what to build. At the end of the day, he'll build the most elegant beautiful system that almost never accomplishes the business goal. He knows how to describe a problem, but he doesn't really know the problem.

The second likes programming, he finds technology fun, but he is really driven by trying to understand the full context of the business. Writing software is a means to see an impact on people. He's driven by seeing a business problem solved. I've only met 2 people in my career who are ACTUALLY like this, they're rare... which is maybe a good thing because they write shit code.

A good engineering team tries to get both of these guys, you have the tech guy making sure your platform is maintainable, and you have the business driven guy who makes sure it's useful. One guy understands the structure of the tool, the other understands the structure of the world the tool is in.

A language is a tool, it can be elegant, it can be beautiful, it can be technically perfect.. and just like poetry, it can have a very little practical utilitarian purpose. When I look at how we're using ANN's to develop language today, this is how it feels to me. We're spending so much time trying to figure out how to get a computer to build the most technically perfect sentence, we're missing the maybe more interesting problem of trying to get a computer to understand the world. My son right now isn't old enough to conjugate a sentence, but he understands what certain things in the world do. He clearly understands that cars move things, he understands you can use the hose to get things wet. He's not that old, but he's developing a mental model of the world. He just doesn't know how to describe it yet.

To me, having a computer look at a crane, and then print the word "crane" is interesting, but even more interesting is if you could give it 3 pictures (a crane, a building, and a pile of rubble) and teach it how 1+1=2.


Actually thes two parts do exist in tandem, it's just that those engineers/designers (a role you that can do the same) are usually independently successful and have repeatedly built high revenue products or started a company themselves.

The major flaw i see in manager/corp/team analysis of workers is that it misses out on a portion of the population that is genuinely independently functional and creates and shares the value they create. They don't work for companies because they either don't need to or own their own. These are the ideals worth keeping in mind.


If we use animals as a reference, I would say that consciousness is more fundamental than language, so most likely we need that in place before we can get AI to be able to effectively understand language.


I agree. It seems to me that before language can develop you must have ideas, such as notions of space and time and existence, and in order to have ideas you probably need some kind of sensory apparatus that tells you things about yourself and about the environment you are in. But since we don't really understand how these things work, I doubt that we'll be able to recreate anything like it artificially.


We could call this the "Kantian Space-Time Assumption" as a precondition for strong AI. :) As others have mentioned, it is a philosophical question and probably one of the fundamental questions of our and next generations.


Complete agreement. How the heck could an entity (machine or biological) communicate meaningful ideas of any sort without a comprehensive understanding of the world upon which those ideas rest?

Language isn't some side feature. It's a complicated interface layer that lives on top of an enormously rich, dynamic internal model of the world one lives in.

The article barely touches on this fundamental aspect.


Do you mean "consciousness" as being aware of one's own existence and relative position in a larger reality, or as having subjective experiences (qualia, feelings)?


It's a state of being able to say "I am doing X, I am doing Y. I am thinking Z. (but not necessarily in words of course)" It's also a creation of an executive process that is separate from the analytic processes. The executive process can be aware of the analytic processes (Like we can know about our heart beating, but it's a separate process)

Subjective feelings are not necessarily part of that.

This article on how consciousness evolved does a good job of explaining how it works (finally), and I think it's something we could emulate.

http://www.theatlantic.com/science/archive/2016/06/how-consc...


I'm going to give the opposite answer as u/Practicality here and say that it's the qualia/subjective experience. What's really extraordinary, currently not understood, and certainly a major part of actual human cognition, is subjective experience itself.

Coding up a simulation that understands itself as one conceptual entity among many is not that interesting. The trick is having a subjective experience of that understanding. It seems to me that: subjective experience + sufficiently advanced conceptual understanding is what gives rise to, is the definition of, self-awareness.

See my other posts in this thread for more thoughts on this.


I think about the AI language problem a lot while raising my kids. The article notes the word "forever" and how an AI must distinguish the literal from the figurative meaning of the word in context. My five-year-old still doesn't grasp the literal meaning of this word as "never-ending." To him, "forever" is simply a very very long time. He has the same problem with the concept of "infinity," where the word means both "the biggest number" but also has the characteristic (in his mind) of having an upper quantifiable bound. His young mind has not yet recognized the paradox that "infinity" is the biggest number, so what does it mean when I say, "Infinity plus one"?

Neural networks are going to make huge inroads to the AI language problem simply by exposing the AI to example after example of words in varying contexts. But I wonder if the real problem is getting those neural networks to let go of unnecessary data? Humans rely on excited neurons to recognize patterns, but our neurons let a lot of sensory input pass us by to keep from getting bogged down in the details. Are the image-recognition AI's described in the article capable of selective attention? Will they get bogged down in the morass of information in trying to pattern-match every word to every image and context they learn?


>Are the image-recognition AI's described in the article capable of selective attention?

Yes they are: https://indico.io/blog/sequence-modeling-neural-networks-par... https://github.com/harvardnlp/seq2seq-attn


Kids are an awesome way to think about these problems. I did a teaching abroad stint once and it was incredible to witness kindergartners wrastle with this new language I was introducing to them, adopt it while they're still learning their native one. It got even cooler when there'd be a half native kid who already spoke English as well as Chinese.

I wonder if there's AI focused research that analyses how children learn language as an aspect of their research? Especially kids learning a "new" language.


Maybe this is good. After all, mathematical abstractions may cause more philosophical problems than they solve. What if there is nothing infinite in this world? Having a firm grasp of reality before venturing into hypotheticals can be good.


But what is language if not a tool for manipulating hypotheticals? If your language can only describe what you know to be possible, it can only describe things you've already seen, and the required dataset in your memory to have a conversation or provide useful information is way too large. Abstraction is the very problem of language that AIs are trying to solve.


> His young mind has not yet recognized the paradox that "infinity" is the biggest number, so what does it mean when I say, "Infinity plus one"?

Paradox? In the extended reals, where infinity is the biggest number, infinity plus one gets you infinity, just as you'd expect. In, say, the study of ordinal numbers, where there are many infinite quantities, it doesn't make any sense to talk about "the biggest number".


A paradox is "a statement or proposition that seems self-contradictory or absurd but in reality expresses a possible truth". If you encounter a paradox, it tells you more about how your framework of thinking is flawed than it does about reality.

"Infinity plus one" is a paradox to a child who believes that infinity is a finite number. When they realize what infinity actually means, the paradox will be resolved.


That reminded my of this blog article by Scott Aaronson: http://www.scottaaronson.com/writings/bignumbers.html


I have often wondered whether the reason why 'AI' has struggled with human language is because most programs are not embodied. If you cannot jump when you hear the word joy, and cannot cry when you hear the word death and feel the other sensations that go with them, you cannot truly understand the word. The lack of additional visual cues and context will continue to severely hamper the ability for machines to correctly understand the semantics of human language. I know the language guys like to work with their strings of characters, but humans can only communicate with those because we have already built our semantic framework by being active agents in the world.


My belief is that "Deep Network" systems will fail to produce commercially useful results on language just as symbolic systems failed in the face of visual recognition.


In terms of DNA difference, humans are very close to mammals that don't have much language capability but demonstrate some degree of intelligence. This suggests that language is not as fundamental as some researchers claim.

I once told Rod Brooks, back when he was proposing "Cog" (look it up), that he'd done a really good insect robot, and the next step should be a good robot mouse. He said "I don't want to go down in history as the guy who built the world's best robot mouse". "Cog" was a dud, and Brooks went back to insect level AI in the form of robot vacuum cleaners.

We need more machines which successfully operate autonomously in the real world. Then they may need to talk to humans and each other. That might work.

The big problem in AI isn't language, anyway. It's consequences. We don't have common sense for robots. There's little or no understanding of the consequences of planned actions. We need to get this figured out before we can let robots do much.


This is also why AI won't overthrow us and become our masters. They don't even know what it means to be a master.

EDIT: This is the most volatile comment I've ever posted. It has been going +2, -2, +2, -2 for the last 35 minutes. People seem to love it or hate.


Nor does a meteor know what it means to be an extinction event.


True but a meteor doesn't have to. As it has been written of in our sci-fi, the robot uprising specifically requires the machines to understand and, most of all, care, about dominance.

If you're talking about just being victims of machine logic, we've been suffering that since the invention of the traffic light traffic jam.



That's because fiction is fiction, especially one created for the masses must be written in terms masses can relate to.

As for reality, creating an AI that will actively hate us is a feat of about the same difficulty as creating an AI that would love us. Those are two opposite points of a tiny island called "has more-less human mind" that floats on a vast ocean named "we all die". The biggest challenge of surviving superhuman AIs is locating that island.

As the words of wisdom say, "The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else."


It may not need to understand or care. It may just be optimizing towards a goal, which involves 'humans need to die' or 'destroy/pollute the water supply', etc. It may indirectly cause serious damage without even awareness we exist.


Not yet.


As I understood this article, they're not even trying to. Their version of understanding isn't of the right sort.


I haven't heard of an AI system understanding that yet but it's a fairly common concept in everyday life for instance dogs tend to have masters. Anyone trying to make an AI understand everyday life or literature is going to have to deal with that.


I think the article speaks to this very point. The people "trying to make an AI understand" are not making it "understand" in the sense that you and I use most commonly use the term.


You're exactly right. If you look closely, present day A.I is modeled after the cortex. It's akin to cutting out the neo-cortex, wiring in i/o hooks, and reprogramming it to one's needs (weak A.I).

The cortical regions are good at creating hierarchical feature maps and we had a bunch of search algorithms laying around in the parts bin. Presto : Present day A.I.

This approach meshes well with 'big-data' companies in possession of large compute stacks and data-sets. So, its the direction things went.

A perfect opportunity for disruption.. The current wave is on borrowed time.


We might still have to go much more basic.. Or we might end up going there in future.... Some have mentioned here having consciousness is more fundamental than language.

Even very basic creatures with less intelligence "learn" because they "want to live".

That is the key - You want to stay alive.

You can't be immortal. You just don't live to learn forever. You live to stay alive and feel happy. And that is what drive us to learn.

Could this be true in case of machines?


This is not a question of "consciousness", this is a question of Reinforcement Learning.


You cannot understand apples by analyzing the word "apple". Language is just the paper trail. There are plenty of examples: You can't understand a bank or money by just analyzing bank statements. You can't understand food or supermarkets by analyzing their receipts. You can't understand the internet by analyzing TCP/IP.

Regardless of the rich context a word may reside, or the infinite sample pool of text upon which we may unleash our learning robots, as long as they are all words, you will never encounter the real apple to which the symbol is linked, nor reach the reality to which all the symbols are linked. The machine will learn something. It just won't be anything like what we know or understand, or what generated the paper trail in the first place. They will be awkward simulations, which is exactly what we have.

My now 20 month old son wasn't born literate, but already spoke. A grunt, a groan, a giggle, and a moan. These are his words. There is nuance, there is rhythm, there is intention, and there is tone. Not that I'm any good at writing baby books, but there is a reason babies enjoy rhyming and puns and silliness. The point is, words offer so much more than their meaning. This is the language they understand. They don't know English, and we aren't teaching them English. But they are slowly but surely articulating themselves, be it that they're hungry, lonely, or just want that grape. In fact, who is teaching who? Parents learn the language of their child first to be any good at parenting. Their expressions of their intelligence precludes the expressions of our own. Maybe all we need is a machine that groans.

I find it no coincidence that the philosophy of Ludwig Wittgenstein evolved with his experiences teaching children. And if I were to make a bold prediction, it appears the field of AI will benefit immensely from all the young and talented AI researches who start having kids of their own. The comments here already seem to attest to this. It's either that or giving up and deciding to teach preschoolers for a while. Either way, we'll soon have our book on AI that will do what Philosophical Investigations did for philosophy. And I can't wait.


My intuition is that the language problem will be solved, but that the solution will likely be a hybrid symbolic and deep learning system. BTW, I have been working (some of the time) on both symbolic NLP and neural networks/machine learning since the 1980s: right now is the most exciting time in the field of AI because progress is rapid and accelerating.


Somewhere here, Noam Chomsky is still kicking and saying "I told you so"

http://www.tor.com/2011/06/21/norvig-vs-chomsky-and-the-figh...


Solving language for computers seems much like climbing a series of mountains, where each time you surmount one, you realize the next is even higher :) Thanks to deep learning, machines have made rapid gains in speech recognition, as well as improving semantic mapping (a la word2vec and other word embedding approaches).

But once you have a system with human-level speech recognition and semantic mapping, where do we go? The ability to have a meaningful dialogue with a machine seems very difficult to model as a machine learning problem (what constitutes ground truth? What's the reward function look like?), and also has to deal with many unknowns. For example, ask a smart assistant like Alexa or Siri about functionality it wasn't programmed with, and you get a terse "Sorry, can't help you with that." But ask a child, and you prompt a question-answer dialogue (i.e. learning) or perhaps feigned understanding. My toddler son is an expert at giving me the answer he thinks I want to hear, even when he has no idea what I'm talking about :) There are certainly many new problems which we can begin to think about tackling, but certainly no sign IMO that we're running out of applications for deep learning in the field of language.


"much like climbing a series of mountains, where each time you surmount one, you realize the next is even higher"

Aren't you describing learning, in general? Physics, math, biology, etc


The problem with natural language processing is that we are trying to learn it (read: construct models of it) from utterances, things that are being said or transcribed. And that is a big, huge problem because there is a lot more to language than utterances. Hell, there is a lot more to language than language itself.

There are things you cannot put into words, and yet you think them. There are things that you can't put into words and yet you can make people around you understand them. There are things you understand without even knowing you understand them. But even before we go there- there are so many things that people can make utterances about that are not possible to collect into example sets and train models on.

How do you collect examples of whatever it is that makes people lie on the beach to get a sun tan? How do you collect examples of imagination, dreams, abstract thinking, all those things that your brain does that may be a side-effect of self-aware intelligence or the whole point of self-aware intelligence in the first place?

How do you collect a data set that's as big as the whole world you've experienced in your however many years of life? And even if you could, what machine has the processing power to train on that stuff, again and again, until it gets it right?

Machine learning meaning is hopeless, folks. Fuggeddabout it. There's not enough data in the whole world, there's no machine big enough to process it if it existed. We 'll make some advances in text processing, sure, we'll automate some useful stuff like translation (for languages close to each other) and captioning (for photographs) and then we'll stall until the next big thing comes about in a few generations from now.

That's what the current state of the art suggests.


> There are things you cannot put into words

I am a very bad example though, for one because English is just my second language. Sure there is thinking before words are learned. Language is a complicated problem to talk about, just like self awareness. Consciousness is a very nebulous term to me. Still, you'd have to prove that language is theoretically unfit. Any such logic might be incomplete if you suppose you cannot put it into words. A complete first order logic is expressible however, following Goedels completeness theorem.


Is there any hope that if chat bots do actually become widespread in usage that eventually we'll be able to aggregate their collective knowledge similar to reinforcement learning for a single system? That seems like the only likely way we'll ever be able to train AI in something as complex as language.


The primary issue is that language doesn't have as well of a defined "success" metric. So more data doesn't necessarily make better language.

Without a human analyzing the transcripts it's very difficult for the chat bots to know which inputs it's receiving are "good" or "better."

Even the idea of good language is subjective. We all know there is such a thing, but nearly everyone has different ideas of what this is.


It depends on what the nature of language is. If it is purely a tool of consciousness, a separate module in mind like basic image recognition seems to be, then the picture you're painting may be possible. It may be that the language signal is strong enough to pick up just by looking at orders of magnitude more examples than we can right now. That we can crack the encryption, so to speak.

However, if language is more integral than that. If language is more a facet of intelligence than a building block. If language is the structure of sentience, rather than something sentience leverages, then no, all the chat bots in the world won't help. We need something that can integrate more than just plain text embeddings of incredibly intricate and complex structures. We can't crack this code without a key.

My guess is that this latter scenario is the likely case.


AI as it's defined today is fundamentally reactive. If we applied the AlphaGo methodology to language, it would come up with what a good response would be to words it heard, but the purpose of such a conversation would be the conversation itself.

A real conversation is about conveying understanding, not about the words spoken.

AlphaGo was trained on however many zillions of games and playing against itself, but does it actually understand anything about the game? Or can it simply react to the current state of the game and suggest what the next move should be. It will never have a leap of intuition causing it to say "the only winning move is not to play."

Intelligence is not purely reactive.


How would one prove the opposite? That a human actually understands anything about the game and isn't reacting to the state of the game to suggest the next move? I'm not saying AI as it exists today understands, I'm just saying this "understanding" metric isn't a good metric unless it works in reverse.


I don't think we can with a game. Games are a progressive sequence of states with permitted transitions defined by the rules; they are inherently reactive. The only way to prove understanding is to ask things like, "Why did you make that move?", or maybe more specifically, "Why was that move the one that best maximizes your chances of winning?" I'm not sure AlphaGo could answer that question.

Basically, you need to ask questions that require meta-cognition, like, "What does Mary think about you?" That requires:

* Understanding of yourself as an entity.

* Understanding of Mary as another entity, with its own state.

* The capability to use previous interactions to approximate that other entity state.


It would be pretty easy to prove a computer didn't understand a conversation.


The focus on vector space as mentioned in the article "words can be represented as mathematical vectors, allowing similarities between related words to be calculated. For example, “boat” and “water” are close in vector space even though they look very different. Researchers at the University of Montreal, led by Yoshua Bengio, and another group at Google, have used this insight to build networks in which each word in a sentence can be used to construct a more complex representation—something that Geoffrey Hinton, a professor at the University of Toronto and a prominent deep-learning researcher who works part-time at Google, calls a “thought vector.”"

Is to me, the most significant way in which we can mimic the way the human cognitive process develops associations between things. Auto-association is key here.

In addition, understanding how to calculate similarities between vectors is also important.


I wonder if in the future we'll have matrix-style "blobs" of knowledge that we can plug into compatible AI systems. That way it has to just be trained once on a strong system, and then other systems can take advantage of the learning by forking the state and importing it. It would definitely speed up training AI, and possibly even enable it on lower powered hardware.

Imagine downloading "english teenager slang 2016 v2.0" to your home AI, so it can understand what the hell your kids are saying :)


I think this is too linear. My two cents are that AI, when it cracks one language, many languages will be cracked after that. In other words, exponential growth. With Open Source still being prevalent, I just think the libraries will be shared as it will bring in a new era for humanity. Think about all those languages we don't even have translators for (namely those countries that have only 20 people speaking a specific derivation of a derivation of a .....of a language humanity can translate). There are almost 7,000 languages in the world [1]. Language is a tough challenge but when we find a breakthrough, it's my strong opinion, that we'll bring everyone to the table of communication.

[1] http://www.linguisticsociety.org/content/how-many-languages-...


If the blob of knowledge consists of the weights of some neural network and if this blob is public... Then an attacker could easily perform imperceptible perturbation to the input in order to make the network believe that the yogurt is an Eiffel tower or vice versa. (Can't find the related publications right now but it appeared several times on hn before).

So if you don't want the system to be gameable, such public blobs of weights may need to be avoided.


Likely by that time we will have overcome this problem.


This is the "time flies like an arrow" problem again, isn't it? Fruit flies like a banana. Context, sense, meaning. All of which require facts about the real world.


Language is just a protocol for synchronizing slices of two world models, the one in the head of the one who talks and the one in the head of who listens. If the recipient doesn't have the model similar to the one sender has, language is meaningless.

You can't "understand" language without having model of the world that humans construct during their life and education.

So pretty much next step for language recognition is indistinguishable from sentience.


> his team was just as surprised as everyone else ... It was only several days later, after careful analysis, that the Google team made a discovery

Since success has a higher priority for researchers than explicable success, and if the "singularity" is just progress that is not understood, it may be almost here - and not require true AI.

Though to be fair, by that definition, the singularity has always been with us, since we don't understand how we think.


The problem with deep learning and language understanding is that the task is ill-defined end-to-end. For speech, image understanding, and translation, you can come up with large datasets of x->y and have deep learning learn a complex function to approximate the mapping. We don't have that luxury in language understanding, at least not yet.


Consciousness is the secret sauce. Consciousness as in "subjective experience", what separates us from philosophical zombies, the sensation of the color blue, of middle-c. Our bodies evolved and kept this extremely rich phenomenon for a reason; it is extraordinarily unlikely for it to have arisen and remained through genetic drift alone.

My theory, and I'd love to find someone offering a similar and more fleshed-out hypothesis, is that conscious experience serves as a universal data type. It can encode and play-back any type of knowledge and memory, and relationships among them, from the color of the teacher's shirt that time you broke your bone in 3rd grade, to the formula for electron energy in quantum mechanics.

Unfortunately, the word consciousness is almost forbidden in most scientific circles. The dominant view is that there is no Hard Problem of Consciousness and that any discussion of it is quackery, or at least "not science". This taboo is holding us back.


It won't remain a taboo when someone proofs a functional computational model for it. It seems that will need to occur to silence the idea that its quackery. Nothing short of that will suffice. Seeing will indeed be believing.

I was partly inspired to pursue the path I took in R&D by observing that the industry didn't seem to want to consider someone thinking different or working on the true foundation of A.I (the hard problems).

I figured, if I was able to write software up and down the stack for the billion dollar network infrastructure equipment that powers and services the internet, I probably knew what I was doing w.r.t to engineering.

A networked system with a missing foundation...The rest is history and I look forward to making disclosures about my work in the near future.

In the meantime, you should know that there are quite capable and industry proven individuals working on this. They aren't quacks, maintain graduate degrees from the top universities of America, and have a proven track record in the industry. The spotlight just doesn't shine in their direction. Of course, once a functional model is proof'd, I'm sure that will change. Such is the history of new paradigms and those who, through deep and new understanding, seek to usher them in...


I've had somewhat similar thoughts, and I am entirely unqualified (and highly likely not the first) to put forth the idea that consciousness's killer app is the ability to rapidly assembly abstract models of experiences (present from current sensory input, past from short/long-term memory or future from mental simulation) and be able to query/manipulate those models, and my (admittedly potentially naive) suspicion is that whatever does "that" is the machinery behind consciousness.

This is also why I think that deep learning / neural networks are only going to take us so far. I think there is more to the story of how the brain works than only neural networks that make predictions, and frankly I do not think that any system that does not at least attempt to do "that" (simulating consciousness's model building/manipulation feature) will have much better luck at language processing/understanding.


One possibility about the machinery of consciousness that would help address the Hard Problem is that the brain is not creating consciousness from scratch, but tapping into some currently misunderstood or unknown physical phenomenon. It's hard to see how information processing alone (which can be done by monks with paper and pencil, if incredibly slowly) can give rise to subjective experience.

It seems that this phenomenon, whatever it is, plays a central role in sensory perception, and there's reason to think that it's present even in animals with simple brains. So I suspect that we're looking for some kind of simple operation that can happen on the scale of a small number of neurons, maybe even a single neuron.

This is all speculation of course, informed by some knowledge and intuition, but speculation nonetheless. But it's the only way to push the frontier, and the unwillingness to engage with consciousness as a matter of serious study seems to be a major failing of brain science and AI.


"If a lion could speak, we could not understand him." -- Wittgenstein


The web of research always spins around money. So far I am not seeing a monetary interest group with a declared imperative of creating a self-sustained artificial mind.


Id rather see an AI that is smarter than humans. People are trying to make human-like entities but lets think outside the box a little more.


> today’s machine-learning and AI tools won’t be enough to bring about real AI

This is important to understand in the midst of today's AI hype.


Has anyone used something like an MMO to get people to help train their AI in an artificial environment?


The problem is : The industry set upon making a skyscraper from the top floor (neocortex) down and assumes they can hack together a foundation and throw up ad-hoc scaffolding on their way down that will magically reflect the brain's capabilities. There are even ridiculous ideas held by industry 'experts' that the foundation will just magically arise from nothing more than a sheer amount of spaghetti wiring complexity. Nothing in the known universe has been proven to work this way. Yet, no one questions this outlandish belief system because the industry experts and notable names are stating it and.. hey look, their top-floor systems actually do something interesting..So, they must know what they're doing and saying.

So, the foundational problems remain...

They remain because there is no foundation to these cortical systems. Anyone who states this is railed and laughed at. So, you get what you get.....

The article states : "Machines that truly understand language would be incredibly useful–but we don’t know how to build them."

There are people and groups who know how to build them. They are focusing on the 'foundation' first. That is not where the spotlight or money are directed. So, they remain in the dark.

We gained head-winds with a very trivial model of neurons and cortex like hierarchical neural network designs and the money sent people off to the races. People began writing wrappers, stuccoing the top floor, hacking up scaffolding, applying any C.S concept they could find in the parts bin to fancify the top floor.

That's where all of the attention and money is.. What does your system do? What benchmark can it beat? What data can it classify? What cool trick can it do to impress us? So, you get impressive trick systems that require massive amounts of data, training, and answer maps to obscure the lack of intelligence. As there is none explicitly designed into these systems, the system cannot convey its understanding.

It's nothing more than an answer map w/ annealing routines and memory... Very similar to cortical regions.

The foundation and supporting layers up to the top have been ignored, aren't getting any spot-light or money, nor are the individuals who continue to toil on it.

They're considered to be 'philosophers' and jokers and not real scientist/engineers/industry leaders. The A.I space shuts out a huge pool of varying opinions via its : If you don't have a PHD, one need not apply. If you're approaching it from any other methods than the ones subscribed to and you're not a name, face, or have a laundry list of papers you get the : Good luck (thumbs up).

And people stand around and wonder why the fundamental problems remain? Come on...

In any event, it wont remain for long and that will be due to someone/groups actually investing the time and energy to build a sound foundation. This begins first and foremost by deep philosophical questions about the nature of the universe and intelligence. The answers derived serve as a guiding light for further along scientific and engineering pursuits.

This article should be : AI's lack of a foundation. Whose going to build it? Whose going to invest the time to understand what exactly it is as opposed to hacking away at it?

It's the truth but would be considered a 'hit piece'. Until someone constructs a proper foundation, no one is going to give credence to the idea that current A.I lacks it. Hindsight is 20-20 as is a force-fed neo-cortex.


Small child have to learn language from nothing. They just figure it out through exposure and practice. Even pets learn some language. This is the model to emulate.

Ultimately language use requires a few skills:

* a good parser * motor cognition/coordination * a good memory * semantics/context * vocabulary * situational awareness

The first two in the list are what small children struggle with the most. Fortunately, we can eliminate motor coordination as a need for AI. Although extremely powerful parsers demand a specialized expertise to produce this part of the problem is straight forward. I write open source multi-language/multi-dialect parsers as an open source hobby.

I discount vocabulary and situational awareness, because most children still haven't figure this out until they enter high school long after they have learned the basics of speech. That pattern of human behavior dictates that while it might be hard to teach these skills to a computer you can put this off a long ways down the road until after basic speech is achieved.

If somebody paid me the money to do this research my personal plan of attack would be:

1. Focus on the parser first. Start with a text parser and do audio to text later. Don't worry about defining anything at this stage. When humans first learn to talk and listen they are focusing upon the words and absolutely not what those words mean.

The parser should not be parsing words. Parsing words from text is easy. The parser should be parsing sentences into grammars, which is harder but still generally straight forward with many edge cases.

2. Vocabulary. Attempt to define words comprising the parsed grammar. Keep it simple. Don't worry about precision at first. Humans don't start with precision and humans get speech wrong all the time. This especially true for pronouns. Just provide a definition.

3. Put the vocabulary together with the parsed grammar. It doesn't even have to make sense. It just has to have meaning for words and the words together in a way that informs an opinion or decision to the computer. Consider this sentence as an example: I work for a company high up in the building with a new hire that just got high and gets paid higher than my high school sweetheart.

4. If the sentence is part of a paragraph or a response to a conversation you can now focus on precision. You have additional references from which to draw upon. You are going to redefine some terms, particularly pronouns. Using the added sentences make a decision as to whether new definitions apply more directly than the original definitions. This is how humans do it. These repeated processing steps means wasted CPU cycles and its tiring for humans too.

5. Formulate a response. This could be a resolution to close the conversation, or it could be a question asking for additional information or clarity. Humans do this too.

6. Only based upon the final resolution determine what you have learned. Use this knowledge to make decisions to modify parsing rules and amend vocabulary definitions. The logic involved is called heuristics.

This only way all this works is to start small, like a toddler, and expand it until the responses become more precise, faster, and more fluid. At least.... this is how I would do it.


It depends a bit on what you are trying to achieve but I think hooking neural type networks together to simulate human mental faculties might be a better way forward. For instance much of human thinking seems to work around visualizing things in 3d space so you can say to someone imagine a dog on a skateboard on top of a hill and you give it a push, what happens? Once you've got that kind of stuff working with spatial awareness, cause and effect and so on using neural type processing I think the language understanding would come fairly naturally.


You have some good points, but this naive approach of handcoding "cognitive modules" was tried many times in 20th century, and it didn't work at all.

But look at what Deepmind does: it takes these ideas (and also ideas from systems neuroscience), implements them as differentiable modules and trains them on data in end-to-end fashion. This works really well.

Learning is very important, much more important than architecture. If you have a model that can learn you can add more structure later - again this is what modern deep learning is all about.


since it was using movies maybe the 8 legs is for a human centipede?


I don't think the issue has anything to do with cognition and more to do with something that we do so subconsciously we don't always notice it as we do it: error correction and context setting. A big part of language are our error correction channels. In text it's a lot less obvious because we twist the language to clear things up, but speech is full of a lot of "I'm sorry, what?" and "uh, you know" and hand gestures and furrowed brows and a million other side channels to get someone to repeat something or elucidate it or set a deeper context.

But that happens in text too: we group things into paragraphs and add a lot of punctuation and as we read we sometimes skim a bit, return as needed, reread what we missed the first time. (Or in texts/IMs our cultures are in the process of building whole new sub-dialects of error correction codes like emoji and "k?".)

A lot of people would think a machine is broken if it hemmed and hawed as much as people do in a normal conversation; if it needed full paragraphs of text to context set and/or explain itself.

The biggest thing lacking in voice recognition right now is not the lack in word understanding or any of the other NLP areas of research: it's in a lot of the little nuance of conversation flow. For now, most of the systems aren't very good at interruptions, for instance. From the easy like "let me respond to your question as soon as I understand what you are asking to save us both time" to the harder but perhaps more important things like "No [that's not what I mean]" and "Wait [let me add something or let me change my mind]" and "Uh [you really just don't get it]" and presumably really hard ones like clears throat [listen carefully this time].

The point should not be that we hit 100% accuracy: real people in real conversations don't have 100% accuracy. The issue is how do you correct from the failures in real time and keep that "conversational" without feeling strained or overly verbose (such as the currently common "I heard x, is that correct?" versus "x?" and head nod or very quick "yup").

We don't consciously think about the error correction systems in play in a conversation so that makes them hard to judge/replicate and it's easy to imagine there's an uncanny valley waiting there for us to get from no "natural error correction" ability across to supporting error correction in a way that it works with our natural background mechanisms.

At least in my mind, that's probably the next big area to study in language recognition is deeper looks into things like error correction sub-channels and conversational timing (esp. interruption) and elocution ("uh", "um", "you know", "that thing", "right, the blue one"). I'd even argue that what we have today is probably already getting to "good enough" for the long run if it didn't require us to feel like we have to be "so exact" because you only get one sentence at a time and you don't have good error correcting channels with what we have today.


Reminds me a lot of that classic Stephen Bond essay: https://web.archive.org/web/20131204031113/http://plover.net...


Is there a story behind the pictures accompanying that article?


These guys might disagree. Sometimes they make sense which is interesting http://sumve.com/ai-chatbots/relationships/relationship-bots...


They seem to be just talking over each other with vague references to what the other bots are saying.

If you pay attention for about 5 minutes it's just nonsense. I mean, they are clearly repeating things from somewhere else that were sensible in their original context, but now they seem to be saying things nearly randomly.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: