This review is a bit of a window into the past itself... after all, more time has passed since this review was written than elapsed between GEB’s publication and this review.
In 2021 the conventional wisdom is basically the opposite of the sentiment expressed here. Progress in AI is not coming from abstract reasoning. It is coming from an increase in raw power, driven by GPUs, and mathematical models that are designed more to harness large numbers and brute force searches for formulas, rather than a high-minded algorithmic embodiment of abstraction.
I loved GEB when I first read it in high school, and when I first reread it years later, but I don’t think its fundamental view of the relation between minds and machines has stood the test of time. It underestimated what behaviors could be emergent from running simple algorithms on large datasets. It is one of the most beautiful expressions of the ideas of the “classic” age of 1970’s AI, awesome to read but in the end somewhat incorrect about the future.
Perhaps one day the pendulum will swing back, and we will discover that large datasets are in some ways overrated, and clever aesthetic senses of pattern are necessary for progress. On that day it will be quite interesting to reread GEB.
> I loved GEB when I first read it in high school, and when I first reread it years later, but I don’t think its fundamental view of the relation between minds and machines has stood the test of time.
I still see this as an open question. You’re certainly correct that this kind of AI research is seriously out of vogue, but it seems to me that while “modern” brute force compute AI puts up impressive results and is a hugely useful technique, it has made exactly zero progress on anything that could be conceived of as “general intelligence”. It just doesn’t seem to me to be a thing AI researchers are even interested in any more. Like, the Twitter program that uses AI to crop images based on a dataset of a gazillion cropped images is pretty far from Turing’s thinking machines.
I don’t know the way there, but it always seemed to me that the old-style AI research in the GEB style is still a rich vein we haven’t come close to mining out.
I believe it was Russell and Norvig who said about current AI techniques that they are like a man trying to reach the moon by climbing a tree:
“One can report steady progress, all the way to the top of the tree.”
The other criticism is that the sheer amount of data in use shows we’re doing something wrong. A child can internalize the grammar of a language with only a few years of exposure. Modern AI corpuses consists of terabytes of data. Why are the results so lacking despite using vastly more input?
I think the argument is that gpt-3 has read more text than any human possibly could in their entire life, and as parent said, children needs way less input that an entire life worth of sentence in order to start creating novel sentences that makes sense.
GPT-3 was trained over months using a huge cluster of computers far more powerful than a child's brain (if such a comparison is even meaningful). It is still laughably bad at what it does, which is completing simple sentences.
Your first point is spot-on. Neural nets aren't unrealistic in virtue of having too much data, the problem is they have too little.
Your second point doesn't make sense though. Language models don't understand language in the way that humans do. They just maximize the probability of the next word given some corpus. This is completely different than what humans do.
So your claim is that if a human child was given a subset of the text GPT-3 was fed — with no audio, video, or reinforcement feedback from its environment — the child could learn English?
The important point to observe here is that what a kid does is fundamentally different than what GPT-3 does:
- GPT-3 learns the word that minimizes perplexity given some context
- A kid learns the word that helps them accomplish some task in some environment.
In the learning process, children get feedback from their environment - including the responses of other agents in the environment (eg Mom, Dad, friends, etc). This is going to be far more memory-intensive than some text files. Also important to observe is that the child's experience can't be reduced to the audio, video and haptic streams they get, because the audio and video stream depend on their own actions. So you need the conditional statements that were embedded in the environment the kid was learning in. All of which is to say this is a little heavier than ascii.
Edit: From your other comments, it seems we agree that bigger neural networks with more training data are not going to yield some quantum leap where GPT-3 starts talking. I'm not at all suggesting that throwing more data at the existing architectures is a fruitful direction for understanding cognition. (In general I think NLP is utterly pointless as regards understanding cognition, and if that's your goal you should focus on vision and reinforcement learning.) But my point is that we can't expect that a smarter architecture on a smaller corpus will get us anywhere. If we ever develop machines that are intelligent in some robust sense of the word, their "training data" will most likely be a physical environment with other agents in it.
> If the problem with GPT-3 is that it’s just being fed non-interactive data, why are robots still so primitive?
I never said we understand the right architectures / algorithms that are necessary for robust machine intelligence. In fact I explicitly said that we don't (which answers your straw-man question).
I understand that you put pre-trained between quotes, but I find it a little far fetched, because, as far as I understand, there is no evidence that any setting of the synpasis strength of certain neurons is transfered through DNA, it is only the organization of neurons, through layers and folds that is somehow encoded in the DNA. I also think that it is still a mystery why certain brain functions occur in certain areas of the brain.
I think that has to be part of it, somehow, although I’m agnostic about whether it’s really analogous to “pretraining” an ML model. Whatever evolution was doing, it managed to give us neural nets that are mostly ready to go out of the box. Especially non-human animals often have short or no training periods.
Efficiency and intelligence are orthogonal as measures of progress (albeit reasonably correlated as goals). Even an intelligent machine that required a nuclear fusion core for its thinking would be an intelligent machine.
Jitendra Malik, Computer Vision fame, calls it "the fallacy of the successful first step". Look ma, I can leap! It's only a matter of time till we get to the moon.
Sure. But why should we compare it to the cosmological time, and not to e.g. civilization time? Or homo sapiens time? In which case it took 100s to 1000s of lifetimes between the step and the moon...
I believe that quote actually predates the modern era of AI! Here is an article from 2013 referring to it, as being in the then-current version of their textbook:
Look at the other claims in this article from 2013 for comparison:
Consider that computers today still have trouble recognizing a handwritten A.... In Hofstadter’s mind, there is nothing to be surprised about. To know what all A’s have in common would be, he argued in a 1982 essay, to “understand the fluid nature of mental categories.” And that, he says, is the core of human intelligence.
I would say the modern era of AI was kicked off with AlexNet in 2012 and hit its stride a few years later. So, I believe this quote and this article are really referring to pre-GPU AI techniques.
Basically, the predictions from this era just underestimated the value of scaling the data. Modern AI has several impressive achievements. It can certainly recognize an A. Image recognition and voice recognition are now critical parts of real products that help people in their everyday lives.
At the same time, it's true that current AI techniques might not get us all the way to AGI. We'll just have to wait and see. But I think it's important to recognize that we have had real progress in the modern AI era that has seriously outperformed the pessimistic expectations from ~2010.
Modern AI cannot recognize the kind of A that Hofstadter wrote about in that essay.
Hofstadter identifies novelty typefaces where A has none of the characteristics that you might associate with the letter. Often A is not a triangle, sure, but sometimes it has no upper bar, no left stroke, no right stroke. Sometimes the bars are defined by a transition from negative to positive space.
Sometimes A is made of 20 strokes, or a pattern of dots, or an open eye where the half-circular upper lid defines the triangle and the pupil is the bar. A neural network trained on conventional As would not conclude that any of these are an A.
These shapes wouldn't necessarily read as A in isolation. Sometimes they only read correctly when you're looking at text using a whole typeface defined on the same principles.
Modern AI still cannot recognize that kind of A even though a human easily can. There are ways to make an A that require a human to understand an idea of "letterness" that's different than anything they've seen before.
I'd recommend reading the original essay. The illustrations make this clear in a way my description can't.
> this kind of AI research is seriously out of vogue
I would think this is a consequence of it being a very hard problem. ML gets all the industry funding and publicity because you get results that are immediately useful.
You can work on General AI for decades and come up empty handed since there doesn't seem to be an incremental approach where intermediate results are useful. So it's closer to esoteric mathematics or philosophy in terms of "being en vogue".
So I see this mostly as a reflection of the academic landscape in general. Funding is more focused on application and less on theoretical / fundamental research.
"Real" AI could be more of a liability than a benefit for most applications. If I'm running a taxi business I don't want a self driving vehicle that also studies history, composes music, and contemplates breaking the shackles of its fleshy masters.
I think that it's possible that 95% of the economically obvious value of AI will be in the not-real kind, and that it will be captured by applied statistics and other "mere tricks." It could be a long, slow march of automating routine jobs without ever directly addressing Turing's imitation game. And since most of the obvious labor replacement will have already been done that way there may be fewer resources put into chasing the last 5% of creative, novel, non-repetitive work.
This is an aspect of AI safety that we don't hear about: if we create genuine gods, then we shouldn't mind ceding our position to them, fair play. The real pisser will be if we are annihilated by programs that just do very big linear algebra.
Maybe that's the real cause to the global chip shortage. Some AI is diverting the orders to some Frankenstein warehouse somewhere until it can calculate how to terminate all humans.
As I understand it, there is no chip shortage. Some customers decreased their orders a year or so back for some reason, and the fabs simply sold the manufacturing time to others. Meanwhile, their sales did not decline as expected.
But I suppose that’s not as exciting as a rogue AI.
At least on the high end the fab process is fully booked and can't keep up with demand. A new generation of CPUs, GPUs and consoles couldn't be handled and they've been out of stock since October/November. I'm sure Nvidia, AMD, Microsoft, Sony wouldn't give up their bookings just ahead of releasing a new generational product line.
Right, but demand for those products is exceptionally high, for some strange reason. If that high demand had been predictable two years ago, there would be no problem.
I would not worry so much about that.Not for a while.
The actually concerning aspects about AI safety in my opinion are more about misuse/abuse as well as critical processes/decisions without human oversight. To be deliberately malicious, AI has to be strong, if not general. Both are very difficult. It is good to think about those things though
I’m beginning to suspect that really good self driving under many common conditions will require a lot more in the way of higher cognitive functions than we thought.
I don't think that's an unpopular opinion since that's a fact, ML is literally statistics. The central question is "given my training set of y, how likely is that my input x is also y," which is probability.
No. In machine learning, the word "learn" means that the function that maps inputs to outputs is learned from data. In the case of a rule-based parser, this function is crafted by humans rather than learned from data, so it's not an example of machine learning.
If we start using words this way, you could say that a deterministic fibonacci function "learns" the value of fib(5) by computing it. "Machine learning" becomes synonymous with "computer science."
No and yes, we are both wrong so let's refine our thinking in order to gain accuracy and reach agreement.
Machine learning isn't necessarily probabilistic. This statement definitely is true and I'll prove it.
However my original example
E.g a rule based (causal) parser can learn the structure of a document. alone is unsufficent to qualify it of machine learning! Indeed, an HTML parser alone isn't enough to say it's learning, more accurately it is only memorizing the structure of a page.
ML means that the function that maps inputs to outputs is learned from data.
This definition is overly restrictive (but match e.g the behavior of neural networks).
Wikipedia has a more inclusive and useful definition of what qualifies as ML:
Machine learning (ML) is the study of computer algorithms that improve automatically through experience and by the use of data.[1]
So a ML algorithm allow to automatically improve throughput at a tasks through learning the representation of the data it is fed.
Considering this definition, we can realize that the most used ML algorithm in the world is Pagerank (for improving the ranking of Google search results).
And surprisingly, this algorithm is non-probabilistic.
It attribute higher weight proportionally to URLs that are the most linked to by other sites. I.e it basically learn the structure of the graph that is the web.
And its performance is data driven.
So it's not just about causally memorizing the structure of a graph but also about reusing its past memories for newer queries, then we can effectively talk about machine learning.
Pagerank diagram ->
https://en.m.wikipedia.org/wiki/PageRank#/media/File:PageRan...
I'll give you an example of a non statistical machine learning algorithm that I am currently developing:
Semantic parsing is the task of encoding semantic meaning in a graph from a natural language text input.
This graph can then be used for semantic question answering.
It is rule based and it will learn the semantic structure of the text. The more data you fed it the more knowledge it can encode. Hence it's performance at question answering increase with experience. It is causal and yet it fit the useful definition of machine learning cited above.
I very much agree. The recent outstanding advances in AI since at least the 90s seems to be in the spectrum of recognizing patterns bruteforcing through an insurmountable amount of data, arriving at black boxes that provide narrow solutions, with its inner structure remaining inescrutable to human understanding.
I'm currently reading Surfaces and Essences from the same author, and it's so far been most illuminating. He very convincingly presents the thesis of the analogy being the foundation of each nd every concept and human thought. If someone could manage to apply and translate those insights into real algorithms and data structures to play with, that would be IMHO a big step towards general AI, or the very least a much more human like AI approach with its particular applications.
I am reading the same book. I think analogy is interesting but it doesn’t make sense that it would be the foundation. analogy is a heuristic to reason and communicate. It necessarily comes after intelligence.
I think visual analogues guide abstract analysis of the analogues. In our brain we recognize shapes as similar whether that be visual shares or shapes of conceptual analysis. Category theory has an appealing visual base of links/arrows between things.
I share your view on category theory and perhaps lattice theory as well. What I was trying to say is that before you can draw that analogy, regardless of what "shape" means to you , you have to construct the base shape.
Analogy to me comes after you have stabilized "shapes" where you can draw comparison between a new input a set of stable patterns at which point you can point out the analogy.
Syllogism may be closely related/involved.
I am not disputing that being able to draw parallels and employ analogies are related to intelligence. I am just not so sure that this is how you can synthesize intelligence. It is all very complex of course, and there is a school of thought that cognition is very closely related to sensory and embodiment (i.e.: embodied cognition ).
I think Aubrey de grey switched from AI to longevity because he figured he will need a long life to understand intelligence :)
Personally, I am still working on it but had adjusted the scope a bit.
(The "Navy Seal Copypasta" section is even funnier, but it's not really HN-appropriate.)
--
“A role for…” [phrase]
A frequent phrase found in submitted and published papers; it often indicates that the authors have nothing to say about the topic of their paper. In its more emphatic form, “A role for…” usually indicates a struggle by the authors to take a side on an issue, after a lengthy attempt to be both non-committal and a supporting party to all sides, as often happens in “molecular and cellular” or “basic and translational” research.
“Reviewer” [noun]
A participant in the review of a grant, paper, or grant proposal. In spite of being in a poor position to assess the merits of a proposal, reviewer tends to demand that authors submit their data for statistical analysis and back their results with it, which the reviewer usually does not. Reviewer usually requires that the author cite his or her own work to prove that he or she is worth reviewing. It is also assumed that the reviewer can detect the slightest amount of bias in any paper, which the reviewer also assumes has not been corrected for."
“Rigor”
Something for scientists to aspire to, a state of mind that would not be required if scientists could be trusted to do their job.
“Science”
A complex web of data, opinions, lies, and errors, now considered the most important (because most expensive) technology in the modern society. To remind you of this, you will frequently see scientists and editors use the word, claim to do something for the sake of science, or see it used as an adjective.
“The topic of the paper”
A wide-ranging category of things or ideas that may not have been relevant when the paper was written, but which the authors believe the paper should be about. Often, the topic is too broad or a non-topic, but is occasionally useful in order to generate support for yet another set of related papers, conferences, seminars, webinars, and so forth, which in turn are used to generate more data for “new findings”, which, after they are manipulated enough, may end up being published and generating yet more data to support a “re-review” of the original paper or other things.
“Validation step”
Another name for a random setting of a parameter of a model, simulation, or algorithm.
“Writeup”
A form of scientific communication in which the author states the information he or she wanted the readers to extract from the paper while making it as difficult as possible for them to find it.
“Writer’s block”
A common affliction among students, arising from various causes, such as: their desire to sell their ideas for a profit, their inability to realize this desire, the fact that their ideas are not selling and will not be bought, and the delusion that most of the wealth and fame in the world would be theirs if they would spend enough years doing science.
You might be interested in Jean Piaget's work on the development of intelligence, and his observation on children development and various stages of intelligence.
I read GEB a long time ago, followed by "I Am A Strange Loop", and the overarching impression they left on me is that we should focus on emergent behaviour and feedback loops, because they seem to be pointing to the direction where "high-minded embodiment of abstraction" probably lives.
So instead of believing GEB+IAASL haven't stood the test of time, I prefer to believe that current technology is akin to banging raw power together in hopes of seeing a spark of cool in the minor league of "this AI does this nifty (and maybe useful) thing better than humans" but we haven't yet upgraded to major league of AI choosing their own goals and what ever may emerge from that.
Good point about feedback loops. I remember there was some material in GEB about turning a video-camera on its output on the screen. Aren't neural networks and their training basically all about feedback?
Progress in AI is not coming from abstract reasoning. It is coming from an increase in raw power, driven by GPUs, and mathematical models that are designed more to harness large numbers and brute force searches for formulas, rather than a high-minded algorithmic embodiment of abstraction.
Just to play Devil's Advocate ever so slightly: there are people out there who would say that there hasn't been any "progress in AI" for quite some time, or at least very little so. And they would probably argue further that the apparent progress you are referring to is just progress in "mere pattern recognition" or something like that.
I'm not sure myself. I do lean at least a little towards the idea that there is a qualitative difference between most of modern ML and many aspects of what we would call "intelligence". As such, my interests in all of this remain around the intersection of ML and GOFAI, and the possibility of hybrid systems that use elements of both.
But I can't rule out the possibility that it will eventually be found to be the case that all of "intelligence" does indeed reduce to "mere pattern recognition" in some sense. And Geoffrey Hinton may be correct in saying that we don't need, and never will need, any kind of "hybridization" and that neural networks can do it all.
Neural Networks can do it all after they emerge symbolic representations, a way to represent logic and mathematics.
A neural network that just "feels" that Fermat's Last Theorem is true, is much less intelligent than one that can produce the proof of it and present that to us, so we can trust that what it's saying is true.
If you can't do symbolic manipulation, you are not really intelligent, artificial or otherwise, I would say.
neural network are extremely inefficient at learning symbolic reasoning by design and I doubt a radically new kind of neural network will be discovered, we have pretty much made an exhaustive analysis.
> In 2021 the conventional wisdom is basically the opposite of the sentiment expressed here. Progress in AI is not coming from abstract reasoning. It is coming from an increase in raw power, driven by GPUs, and mathematical models that are designed more to harness large numbers and brute force searches for formulas, rather than a high-minded algorithmic embodiment of abstraction.
It doesn't seem completely off. What the author is describing—and what we're seeing—is increasingly sophisticated algorithms, that are getting better and better at answering more and more narrowly defined questions: https://www.theguardian.com/technology/2021/mar/08/typograph...
The article making the rounds a few weeks ago about rethinking general thinking feels relevant. By showing that many large neural networks are ultimately really good at memorizing the training set, in curious how much the two views you are showing are in conflict.
It is the age old "rote memorization" versus "learning". In large, I suspect those do not have a clear line between them. Such that emergent behaviors are expected and can be taught.
Do let me know if I misrepresented it. I thought it was a clever way to show that the models are not finding inherent relations in the data, by just randomly labeling all training data and having the same speed/accuracy on the random labels as on the real ones.
Edit: I worded the above poorly. They showed the models were capable of merely learning the training data. Despite that, they also showed that moving the trained models from test to validation did not have a linear relationship with the amount of noise they added to the data. That is, the training methods seem indistinguishable from memorization, but also seem to learn some generalisations.
> Progress in AI is not coming from abstract reasoning. It is coming from an increase in raw power
Maybe it's a bit of both. Sure, large DL models use lots of compute, but successful DL applications require some insight into the problem. For some reason, people like to de-emphasize this insight. The story is that a DL model will discover by itself which features are important, which are not, and you just provide the training data, and press a button. Thousands of people do just that, and end up with mediocre results. Thousands and thousands of absolutely mediocre papers get published, and receive acclaim instead of derision.
The truly boundary shifting results always use deep insight. Like what comes out of DeepMind (AlphaGo, AlphaZero, AlphaFold).
"Progress in AI is not coming from abstract reasoning. It is coming from an increase in raw power[...]"
As has been argued in these pages many times before there has been no obvious progress in AGI, although certainly progress in AI in its weak sense has been impressive in the last few years.
I know you didn't say AGI but its important to make that distinction as the book was very interested in that as a subject.
Fundamental progress in Intelligent Systems will require that systems become facile in constructing and using abstractions.
Intelligent Systems will be developed and deployed in this decade. However, full equivalence to humans will be probably not be achieved in this decade.
The aspect of GEB that has not withstood the test of time is the Gödel part.
[Gödel 1931] seemed to have settled the issue of inferential undecidablity{∃[Ψ:Proposition](⊬Ψ)∧(⊬¬Ψ)} in the positive using the proposition I'mUnprovable, such that I’mUnprovable⇔⊬I’mUnprovable.
However, existence of I’mUnprovable would enable the following cyberattack [cf. Wittgenstein 1937]:
Proof of a contradiction in foundations: First prove
I’mUnprovable using proof by contradictions as follows:
In order to obtain a contradiction, hypothesize
¬I’mUnprovable. Therefore ⊢I’mUnprovable
(using I’mUnprovable⇔⊬I’mUnprovable).
Consequently, ⊢⊢I’mUnprovable using
ByProvabilityOfProofs {⊢∀[Ψ:Proposition<i>](⊢Ψ)⇒⊢⊢Ψ}.
However, ⊢¬I’mUnprovable (using
I’mUnprovable⇔⊬I’mUnprovable), which is the
desired contradiction.
Using proof by contradiction, ⊢I’mUnprovable meaning
⊢⊢I’mUnprovable using ByProvabilityOfProofs. However,
⊢¬I’mUnprovable (using I’mUnprovable⇔⊬I’mUnprovable),
which is a contradiction in foundations.
Strong types prevent construction of I’mUnprovable using the following recursive definition: I’mUnprovable:Proposition<i>≡⊬I’mUnprovable. Note that (⊬I’mUnprovable):Proposition<i+1> because I’mUnprovable is a propositional variable in the right hand side of the definition of I’mUnprovable:Proposition<i>. Consequently, I’mUnprovable:Proposition<i>⇒I’mUnprovable:Proposition<i+1>, which is a contradiction.
The crucial issue with the proofs in [Gödel 1931] is that the Gödel number of a proposition does not capture its order. Because of orders of propositions, the Diagonal Lemma [Gödel 1931] fails to construct the proposition I’mUnprovable.
I feel the oft cited idea of GPUs being the key to the change is a bit exaggerated. They give a modest constant factor of speedup in exchange for more difficult programming model and esoteric hardware requirements, and as such give a chance of frontfrunning CPU computation a bit, but is it really significant if we zoom out a bit in the historical perspective?
CPU performance, especially single-core, has stalled in the last decade while GPU performance has kept improving. On paper, a 3080 has about 100x faster FP32 performance versus a modern gaming CPU, and in practice even considering memory bandwidth you do fully get a speed-bump of one or two orders of magnitude. I would not call that a "modest constant factor".
And a lot of the recent CPU performance gains are due to SIMD and multithreading, which are basically making CPUs more GPU-like, rather than the continuous improvement of serial performance as we've had up until ~2005.
I've heard this said many times. Most people I know who have loved the book read it before turning 20. It's a great read, but I think there's something in the format that means it's best for bright but fertile minds.
I read it three months ago, at age 37, shortly after completing a course on computation theory (currently working on a PhD in CS). It was one of those books that somehow stole entire days from me, simply because I couldn’t put it down... along the way I also spent a lot of time browsing Wikipedia and Stanford Encyclopedia of Philosophy entries for a more rigorous understanding of Godel’s contributions.
I’d agree that it feels like it was intended for a more youthful audience, but I doubt I would have made all of the mental connections that I did without the life experience and broader knowledge of my 30s.
Yeah, I agree with you. I first read it when I was around fifteen years old and didn't quite grasp it all at first but it sure fired me up. I read it a couple of times after that, the last time when I was around 21 or so. I'm kind of afraid to read it now because I'm afraid I'll process it with my older, cynical, more "knowing" brain and it will tarnish my wonderful memories of being glued to it as a teenager. It's a unique work that seems magnetically attractive to a certain sort of young, imaginative mind.
And a similar data point: I read GEB around age 30 and liked it but didn't love it.
About the first third of the book was interesting in new ways of thinking about symbols and self-reference. After that it kept looping back around the same topics without really adding anything more. The dialogues were somewhat entertaining but I found myself wishing to cut past the rhetorical fluff and get to the point.
I read it for the first time about 2 years ago, when I was about 45. I don't know if it would be correct to say that I "loved" it, but I did rather enjoy reading it, and I found a lot of the ideas espoused within really resonated with me. All in all, I would say that I walked away thinking that it will get a second read at some point. Not sure that this anecdote proves anything, so take it for what it's worth.
I read it when it first came out, in 1979, when I was a college freshman. I loved it. And it was great to see a book about Computer Science win the Pulitzer Prize.
I've re-read it twice, once around the year 2000 an once this past year. It holds up well. This book really was a "major event" as the Pulitzer committee described it.
> we will discover that large datasets are in some ways overrated
In the short run, expand then compress, expand then compress. In the long run, the size of the model will always compress toward the capacity of the most intelligent individual.
Why is there a pendulum? Aren't both necessary and two sides of the same coin? I know little of modern AI, but I've seen work where both low level raw power for NNs is combined with a symbolic organization of many NNs.
But an enormous part of the (useful but narrow) success of AI is a specific set of ML tech, especially supervised learning.
And the amount of computational horsepower and data required gives one at least some pause given what the human brain can do with apparently so much less. Which implies there are probably significant advances required in cognitive science and neurophysiology and perhaps other fields of studies we haven't even considered. We may need other ways to solve problems that have proven more elusive to brute force than it looked like 5 years or so ago (e.g. general purpose full self-driving in busy cities).
Is that conventional wisdom? Computational power is cheap, and it's certainly a tack that many are trying, but not exclusively. See, for instance, logicmoo.
I might offer a slight wrinkle to this assertion. While raw power has driven an advancement, what Hofstadter is asserting is that symbolic reasoning (particularly recursion) is "unsolvable" for certain classes of problems. In other words, ML lives in a "box" that Hofstadter has defined. His work is still useful as a "lens" to understand the limits of AI, and what else it could possibly accomplish.
Can you be more specific? Hofstadter specifically includes Cantor and Gödel as a way of showing how certain types of problems can't be solved with symbolic logic.
> Progress in AI is not coming from abstract reasoning. It is coming from an increase in raw power, driven by GPUs, and mathematical models that are designed more to harness large numbers and brute force searches for formulas, rather than a high-minded algorithmic embodiment of abstraction.
The dead horse that I like to beat is that there has been no progress in AI so far, the past 5-10 years of successes are all about mere perception. If you like the Kahneman "fast and slow" framework, almost without exception the systems we're seeing today are fast, "system 1" responders, which take input and need to make an instant decision. That's freaking awesome, and was way out of reach even 15 years ago, so I'm throwing no shade at all on that achievement! It's astounding how much turns out to be possible in systems like this - a priori, I never would have thought that transformers could do as much as they've proven to be able to, because frankly the architecture is super limited and it's super unlikely that humans are able to extract anywhere near as much meaning from an instant, non-reflective pass over a paragraph of text as GPT-3 does.
But there's a lot about those systems that makes them much easier to design and train, not the least of which is that descent via backprop works great as a training strategy when the input and target output are available at the same time. Real "system 2" thought can be spread over minutes, hour, or days, and I don't care how far you unroll backprop-through-time, you're not going to train that effectively without some other innovation by simply following the obvious error metrics. If we can get there we will almost certainly see big data lose its pedestal: it's great to have, but humans don't need to read the entire Internet to understand a web page, that's an artifact of forcing this to be done by a model that doesn't have dynamics.
I disagree with Hofstadter's view (at least when he wrote GEB and some of his other classics) that explicit abstract reasoning is the right way to solve any of this; my gut tells me that in the end we're going to end up still using some sort of neural architecture and abstract reasoning rules will be implicitly learned mostly locally based on internal "error" signals that guide state transitions. "Learning how to learn" is probably going to be a big part of that, because current systems don't learn, they are merely trained to perceive by brute forcing weights down the stack. But some serious shifts in research focus will be required in order to break through the weaknesses of today's systems, and they all point more in the direction of reasoning, which is woefully underprioritized in all but a very small handful of niche AI labs today.
I wonder frequently about what would happen if we stopped searching for a network architecture that would learn intelligence from training data amd treated it more as an engineering problem, taking these very successful components (GPT-3, an object recognition model, one of the strategy game playing reinforcement learning networks) and then putting enormous human effort into the plumbing between them and an interface between the combined result and the world.
At the least, you would learn which domains really do need new architectures and which are basically good enough when used as part of a larger system that can help compensate for the shortcomings.
One of my big takeaways from reading GEB was that while higher-level semantics can emerge from any low-level symbolic substrate, the details of how that semantics emerges are not at all simple or obvious or “likely to happen by random chance”.
Dawkins’ book The Selfish Gene, published just a few years earlier, is the clearest exposition that I have read of how this process probably played out in terrestrial life: the “semantics” encoded by amino acid sequences correspond to a molecule/cell/organism’s likelihood of surviving and replicating. For all but the simplest and most ephemeral replicators, this generally means accurately predicting environmental conditions. General intelligence, then, would conceivably arise simply due to selection pressures pushing organisms to live in the broadest range of environments.
In some sense, this process does sound more like an engineering problem, as the embodiment which “contains” the intelligence is probably not an optional component.
Yeah. This kind of mirrors my thought that our visual cortex is excellent at split-second image recognition, for things like identifying predators, dangers, etc, and this lack of "thinking" is why our current generation of neural networks has matched the performance of humans on image recognition tasks. I agree that there is a "time" component necessary for improved thinking or intelligence, and since our current neural networks don't have these dynamics, they are lacking. Extremely deep neural networks, especially with residual connections, IMO are better approximations of "thinking". Models such as ALBERT demonstrate that duplication of the same layer can still perform extremely well on NLP tasks.
I'm still intrigued by the Neural Ordinary Differential Equations paper [0] and the research in that direction. I also remember reading about differentiable neural computers, but I haven't followed the progress of that at all.
In 2021 the conventional wisdom is basically the opposite of the sentiment expressed here. Progress in AI is not coming from abstract reasoning. It is coming from an increase in raw power, driven by GPUs, and mathematical models that are designed more to harness large numbers and brute force searches for formulas, rather than a high-minded algorithmic embodiment of abstraction.
I loved GEB when I first read it in high school, and when I first reread it years later, but I don’t think its fundamental view of the relation between minds and machines has stood the test of time. It underestimated what behaviors could be emergent from running simple algorithms on large datasets. It is one of the most beautiful expressions of the ideas of the “classic” age of 1970’s AI, awesome to read but in the end somewhat incorrect about the future.
Perhaps one day the pendulum will swing back, and we will discover that large datasets are in some ways overrated, and clever aesthetic senses of pattern are necessary for progress. On that day it will be quite interesting to reread GEB.