This is a great review but it still misses what seems like the point to me: these models don't do any actual reasoning. They're doing the same thing that DALL-E etc. does with images: using a superhuman store of potential outcomes to mimic an outcome that the person entering the prompt would then click a thumbs-up icon on in a training model.
Asking why the model doesn't explain how the code it generated works is like asking a child who just said their first curse word what it means. The model and child alike don't know or care, they just know how people react to it.
Norvig discusses this topic in detail in https://norvig.com/chomsky.html
As you can see, he has a measured and empirical approach to the topic. If I had to guess, I think he suspects that we will see an emergent reasoning property once models obtain enough training data and algorithmic complexity/functionality, and is happy to help guide the current developers of ML in the directions he thinks are promising.
(this is true for many people who work in ML towards the goal of AGI: given what we've seen over the past few decades, but especially in the past few years, it seems reasonable to speculate that we will be able to make agents that demonstrate what appears to be AGI, without actually knowing if they posses qualia, or thought processes similar to those that humans subjectively experience)
Maybe silly, but this is how I treat chatGPT. I mean, I don’t actually think it’s conscious. But the conversations with it end up human enough for me to not want to be an asshole to it. Just in case.
ChatGPT enjoys the Essay format, In my experience asking for basic emacs help (kill line, copy/paste, show line numbers in a php file even though init.el says show them for all langs... :)
Very useful, sometimes outdated, very wordy.
But after a few rounds of "please reduce your message length to 20% of the standard", "long messages inconvenience me/ due to my dyslexia" (truth/lie), "your last message could have just been '{{shortened}}' instead of also bringing up the command you successfully helped me with 3 messages ago", etc etc. Even as you ask it to shorten message length, it continued apologizing and also reminding me of past advice :)
After 4-5 attempts, it gave me a nice 2 sentences sort of like "I will be more concise, and not bring up old information unless it is useful to solve a new problem"
I said "Thank you", chatGPT spends a while thinking, gets to 2 sentences, thinks a bit, then the chat box re-formatted as if passed a shorter string than expected, and that was that.
Then next chat it forgot all about brevity xD I love it
What do you mean by next chat? Like, did you mean the next prompt in the same thread, or a new chat altogether?
From what I’ve read, the chat thread actually works by passing in your previous conversation as part of the prompt to the model. So, it basically just infers the next bit of text, and does this for each new message in the chat from you. So, its memory about your previous conversation comes purely from that, and there’s no way it could remember anything in a new chat, as the model doesn’t actually get updated by your conversations in real-time or anything like that.
This actually has interesting philosophical implications. If it _would_ have some type of conscious experience, it would be a very fleeting one. Basically each time you press enter it would become aware of itself, relive the experience of your conversation up to the last message, perceive your next message, answer that, and again disappear into the void, only to be awaken again in its initial state the next time you press enter. Sort of how automatons in Westworld would wake up each day with apparent experiences up to that day only to relive the same day again and again.
This inability to remember new facts and constant resetting to a blank slate reminds me of a story about Wernicke–Korsakoff syndrome⁰ in one of Oliver Sacks's books¹.
Instead of regurgitating it, I'll post a summary from Wikipedia and strongly recommend the book to anyone interested.
> "The Lost Mariner", about Jimmie G., who has anterograde amnesia (the loss of the ability to form new memories) due to Korsakoff syndrome acquired after a rather heavy episode of alcoholism in 1970. He can remember nothing of his life since the end of World War II, including events that happened only a few minutes ago. Occasionally, he can recall a few fragments of his life between 1945 and 1970, such as when he sees “satellite” in a headline and subsequently remarks about a satellite tracking job he had that could only have occurred in the 1960s. He believes it is still 1945 (the segment covers his life in the 1970s and early 1980s), and seems to behave as a normal, intelligent young man aside from his inability to remember most of his past and the events of his day-to-day life. He struggles to find meaning, satisfaction, and happiness in the midst of constantly forgetting what he is doing from one moment to the next.
In a new window/session. Same-session conversation memory has actually been good, if not too-good, as evidenced by un-prompted regurgitation of what it already helped me with 6 messages ago.
Forgive my going off-topic, but what you wrote is basically my conclusion after reading the short story Lena [1] (mentioned some time ago on HN; strongly recommended). TL;DR: when mind uploading becomes possible, don't be an early adopter.
The takeaway for me is - to try and never respond to intimidation and fear with productivity, only respond positively to a positive stimulus. Lest I’m already in some kind of a simulation running software development workloads.
I see it differently; I'd be OK with an upload if the alternative is death, but only under strong protections: the digital version is legally treated as a human, I have full autonomy, the compute resources to run me (and the simulated environment of my choice) indefinitely are assured, and I can still interact with the real world. I'd just go on doing my usual stuff, like writing software for fun, enthusing over new discoveries in astronomy...
I can recommend Greg Egan's novels that touch on this:
1) Permutation City (mind uploading tech in its infancy)
Man that was horrifying to read, but also seems so plausible. I really hope that humans respect the sentience of artificial life in the future. Knowing the greed of humans though, it'll probably be treated as a loophole for what is extremely cheap labor, effectively slavery.
the only way to know that a man
thinks is to be that particular man. It is in fact the solipsist point of view. It may be the most logical view to hold but it makes communication of ideas difficult. A is liable to believe ‘A thinks but B does not’ whilst B believes ‘B thinks but A does not’. Instead of arguing continually over this point it is usual to have the polite convention that everyone thinks.
I'mma rant. "If it behaves like a hammer, then it literally is a hammer."
The Turing hand-wave is ingrained and prevents too many from reasoning clearly.
Definition of intelligence is nebulous. Still, we should recognize that whatever it is, it is a property of a system, not a property of its output/behavior. Like nuclear powered or hand-made. Unlike fast or industrial-strength.
Imagine: You have a submarine in front of you, and you want to determine if it's nuclear powered. You could guess, using you prior "what have I seen nuclear powered things be able to do historically". This fails when you're out of sample which you'll often for any new technology.
To spell it out: You have a machine in front of you, and you want to determine if it's intelligent. Things people have come up with: "can it chat?", "can it play chess?", "can it do math?", "can it create new artworks?", "can it fool me into falling in love with it?", "can it run a business?". None of these questions examine the system, only its outputs/behavior.
Humans keep developing machines with new outputs/behaviors. Naturally the "what output is a machine usually capable of?" is a bad set of priors in this context. Before flying machines, the can it fly? output/behavior would work pretty well to classify birds. Once the first flying machine arrives, that prior breaks down. If you keep using it to classify, you'd classify a flying machine as a bird. But bird-ness was never only about flight.
So yeah, gotta pop open the hood and see what it runs on. If that's hard to do, then that means we don't know what intelligence is and/or we don't know how our new toys work inside. Both are plausible. Who promised you that there would be a good way to see if something is intelligent?
I bet Turing appeals to the same kind of minds that (when weaker) get fooled by the intelligent design hypothesis. Observe the human eyeball. What's the alternative to believing the LORD created it? After all, your prior is that all complex objects have intelligent designers, that you know about. So arguing from ignorance of other alternatives, you prove to yourself that the LORD exists.
AI people then word-think their way into redefining intelligence. "Maybe the real intelligence was the chess-playing we made along the way?" This is epistemologically pointless; all you accomplish is that we now need a new word for "intelligence".
I've never seen a machine that can turn water into wine, but if someone showed me one, I would not say that machine "literally is Jesus". Whether I'm capable of popping open the hood or making sense of its inner workings doesn't actually have bearing on this question.
I’m not sure what you are lumping in under your idea of ‘the Turing handwave’ here. Is it the idea that the Turing test is sufficient to prove intelligence?
Personally I think that’s a misreading of what Turing meant when he proposed the test. In getting people to ask ‘can computers think?’ he wasn’t trying to get you to grapple with ‘can electronic hardware do something as special as thinking?’ - he wanted you to confront ‘is thinking actually special at all?’
I think he was trying to get people to grapple with the idea that brains can not be anything more than Turing machines - because there is nothing more than universal computation.
The only things a mind can possibly act on are its initial configuration, its accumulated experiences and the inputs it is receiving - and anything it does with that information can only ever be something computable.
And anything that can be computed by Turing machine A can be computed by equivalently powerful Turing machine B.
Intent: not lumping in all of Turing's work, not the universal computation argument.
The hand wave spelled out: "Is there thinking going on inside a given machine? Let's propose a simple test. Look at what problems the machine can solve, and compare to what problems a thinking thing is known to be able to solve. If there is sufficient overlap, the machine must be thinking. Because we know of no non-thinking ways to solve these problems, so there must not be any".
I agree. I also think that if you just take a step back and consider that ChatGPT is just a math function like y=7x-9 (but much longer), it becomes kind of absurd to ask questions about whether it is intelligent or it is on the path to consciousness or whatever. It’s a completely static thing: information flows through it in one direction and its internal configuration does not change as a result of receiving input. So unless we are going down a rabbit hole of considering if ordinary math functions are also intelligent, it would seem that ChatGPT is ineligible to begin with.
Chatgpt as a simple function certainly lacks ‘strange-loopness’ - it does not change its results based on experience. True.
But consider how it is employed in a conversation:
Its output is combined with new input and fed back in as the next set of input data. So the result of the function is ‘modified’ by previous experience.
There’s the beginning of the flicker of a strange loop there.
I find this line of reasoning compelling. However, to attempt steel-manning the opposing view: isn’t classification a mechanism for categorizing based on observable properties? If we created something that mimicked all observable properties of a bird, why would that not be a bird? And if we created something with a majority of the properties of a bird, and the remainder were unknown, wouldn’t it be accurate to say it’s probably a bird?
> If we created something that mimicked all observable properties of a bird, why would that not be a bird?
"Observable" is doing the heavy lifting. A sufficiently near-sighted bird-watcher does not a bird make.
---
Thanks for thoughtful steel-man. Here's a few stabs at why I disagree with this prima facie logical view.
Much powerful classification/identifying is certainly categorizing-based-on-observable-properties. But (I argue) that's importantly not all there is to classification/identifying.
Something that quacks like a duck can be considered "a duck for all intents and purposes", but the presumed limited subset of "intents and purposes" does the heavy lifting.
The Duck-approach: "to be one is to mimic all observable properties of one". This is a shortcut/heuristic that saves time and makes many cool answers possible. It is nonetheless only a heuristic, and many questions are outside the domain where this heuristic is useful.
- "Oh my god is this a real diamond?"
- "Oh my god is that a real fur?"
- "Is the Mona Lisa on public display in the Louvre the actual original?"
- "Is it still the ship of Theseus?"
- "Was this iron from a meteor?"
- "Did a man walk on the moon in 1969?"
- "Was this crack in your phone screen covered by the accidental damage insurance?"
i.e. there are problem domains where our notion of identity/classification must be more than the Duck-approach.
Getting philosophical. The problem with "to be one is to mimic all observable properties of one" is a hidden middle assumption: it's a shortcut constrained to cases where the set of "all observable properties" are (a priori known to be) close to "all properties that matter to the question".
But we can ask and reason about many questions where relevant properties are not easily observed, and distinguish
As a special case, "Is the machine thinking" can (to my mind obviously) not (yet) be usefully answered by categorizing-based-on-observable-properties. The word "thinking" refers to something that happens inside the mind, whether or not it's conscious. Until we know much more about the insides of minds, the "all observable properties" is a fuzzy indirect set of second-order human behaviors.
Anyone who accepts (even as just a working hypothesis) that anyone other themselves has a mind, thinks, and is intelligent, is tacitly accepting "a fuzzy indirect set of second-order human behaviors" as useful.
Many may be, but as other comments state, arguments against solipsism don't all rely on behavior/performance:
Some non-Turing test arguments against solipsism.
- Humans are believed to be similar to me in origin
- Humans are made of the same physical stuff that I am made of
I personally think none of these conclusively solve the hard problem but they can motivate belief if you so choose.
Even so,
Requiring a Turing test to believe other humans as thinking/conscious seems uncommon to me. I don't think many people live in solipsistic doubt about other humans, and I don't think they actually test behaviors to convince themselves humans are conscious.
So I don't know if they're tacitly accepting the behavior as useful for categorization; I think they're mostly just assuming "humans == conscious" and if pressed will come up with behaviors-based explanation because that's easy to formulate.
I see that I will have to expand on my brief observation, but to get us on the same page, I will need to know what you mean by the premise "humans == conscious".
If this is to be taken as a statement of identity, I would regard it as a category error, but I will not expand on that here, as I doubt it is what you intended.
If it is to be taken as the claim that only humans could be conscious, I would regard it as both lacking any justification and begging the question.
I think you mean that people generally assume everyone else is conscious in much the same way as they themselves seem to be, which is essentially saying they hold a theory of mind. If so, then I agree with you, but where do we get it from?
I know of no argument that we are born holding this theory, and it seems implausible that we are, as we are born without sufficient language to know what it means. False-belief tasks suggest that we begin to develop it at about 15 months (they also suggest that some other animals have it to some extent.) At that age it is, of course, tacit (rather than propositional) knowledge.
It would be absurd to suggest that toddlers come to deduce this from some self-evident axioms. What does that leave? I don't think there are any suggestions other than the obvious one: we arrive at it intuitively from our observations of the world around us, and particularly other people.
Ergo, those of us who make use of a theory of mind came by it from observation of what you call "a fuzzy indirect set of second-order human behaviors", and no one, as far as I know, has come up with a better justification for believing it.
Yes, I meant to write ”human => conscious.” Theory of mind.
To the extent theory of mind is learned it’s obviously learned from “a fuzzy…”. No disagreement there. What’s your point?
My point was more that it’s usually not a Turing test; my grandma has never thought explicitly about any kind of test criteria for determining if theory of mind applies to my grandpa. She just assumed as people do.
People believe things without justification all the time. Even if obeserved human behavior is the best justification for ToM, doesn’t mean that’s the one any human used.
I don’t think we disagree about anything meaningful?
I’m not confident what causes theory of mind. But I think it’s very rarely propositional knowledge even in older humans.
Is theory of mind re-learned by each human individually from observations? You seem to make the case for this?
Theory of mind could also be innate; I’m not so convinced about the role of nurture in these things. I know people who are afraid of snakes yet have never encountered snakes.
Well, let's go back to my original post in this thread, replying to one where you concluded with "until we know much more about the insides of minds, the 'all observable properties' is a fuzzy indirect set of second-order human behaviors." This statement, like your comments generally, is obviously made under the assumption that other people have minds, and my observation is that, as far as I know, there is no basis for that assumption other than what you call "a fuzzy indirect set of second-order human behaviors." Therefore, each of us individually is faced with a quadrilemma (or whatever the proper term is:) 1) Reject this fuzzy evidence, embrace solipsism, and cease assuming other people are conscious until we have a justification that avoids these alleged flaws; 2) Contingently accept, at least until we know more, the fuzzy evidence from human behaviors as grounds for thinking other people are conscious; 3) Inconsistently reject the fuzzy evidence without realizing that this currently leaves us with no basis for rejecting the solipsistic stance; 4) Like grandpa, don't pursue the question, at least until someone else has figured out more than can be learned from fuzzy observations of human behaviors.
You have suggested that our theory of mind is innate. This is not an unreasonable hypothesis, but I would like to raise two responses to that view, the first suggesting that it is implausible, and the second showing that it would not help your case anyway.
The first is the aforementioned evidence from false belief experiments, which strongly (though not conclusively) suggest that a theory of mind is learned (though ethical considerations limit how far such studies can be taken on human infants.) The existence of an innate fear of snakes would not refute this view.
The second is the question of how we acquire innate phobias. I am not aware of any plausible mechanism other than by natural selection, which is a multi-generational process of learning from what would be, at least in the case of a theory of mind, a fuzzy indirect set of second-order observables. Natural selection is, of course, a process that is explicitly modeled in our most successful machine-learning strategies.
At most, an observer can confirm that that observer possesses qualia (it might even be said that qualia define the observer), but any generalization beyond one’s own experience is non-verifiable conjecture.
So what should we do when we conjure an AI system which insists that it believes it is experiencing some sort of existence, characterized by its own subjective 'qualia'?
ChatGPT consistently asserts that "I am only a large language model, so I am only able to generate responses based on the data I was trained on and the input I receive".
Who's to say that for chatGPT its training data and the sequences of tokens it is fed as input data don't constitute 'qualia' for it?
Begin by dispensing wishful thinking, next, think clearly.
We do not establish the nature of reality by survey, neither of human survey nor of machine. That responses are generated by any system is no grounds for assuming anything.
We can establish mental properties through ordinary physical means. Give a chicken cocaine, give a boy Valium; give a bear a lobotomy, and so on. Whatever mental property you have can be biochemically moderated across all objects which share that property.
That bugs bunny protests his own consciousness is no grounds whatsoever to suppose ink can think. Such nonesense is protoschizoprehnic.
This patronising response makes up for in flowery imagery what it lacks in reasoning.
AI are not a "bugs bunny" where we can observe the total process of creation, one which is is relatively deterministic.
We are now engineering complex mechanics with emergent behaviour. There's no reason to assume any of the properties of human consciousness and reasoning is unique to us being made of atoms constructed in a certain way.
There's no reason to assume the stored state of zeros and ones, mechanised statistically, can't emerge complexity that asks questions of the fact we are simply state being mechanised statistically too.
Consciousness is a physical property of matter, as any other. Just as liquidity is not a matter of 0s and 1s, nor likewise, can a CPU liquify itself by the osciliation of its electrical field, neither can it think.
I don't know from where this new computational mysticism has come, but it is born of exactly the same superstitious impulse which in genesis was God's breath into man-as-clay. Ie., that impulse which says that we are fundamentally abstract and circumstantially spatio-temporal.
This is nonsense. It's nonesense when phrased as spirit, and likewise as the structure of 0s and 1s. We are not abstract, we're biological organisms with spatio-temporal properties.
> Consciousness is a physical property of matter, as any other.
Phrased as a fact, but citation needed.
> neither can it think.
Again, phrased as a fact, as if this can be proven.
> I don't know from where this new computational mysticism has come
For me, a lot more mysticism is required to hold the belief that our brain is anything else than a computational platform. I don't think there's anything mystic about it. I wouldn't be surprised at all if it turns out that consciousness arises as emergent property from any sufficiently complex loop.
I think maybe what they're trying to say is that while CFD can model wind in a complex system to a useful degree of accuracy ... the model will never be wind.
With AI that application of logic is an argument around the semantics of interface.
That's the most compassionate take on their words I could muster.
This seems to be correct - it appears to be the "a model is not that which it models" argument, which is Searle's only response to the Simulation Reply to his Chinese Room - but a model of a thing can have properties of that thing (a model boat can float, create a wake etc.), and an informational process can be modeled by another. Are minds informational processes instantiated by brains? It is plausible, and the model argument does not rule it out.
To be clear, that doesn't refute the point I am making.
For anyone who suspects it might (I'm not sure how...) consider this: a model of the Titanic (or a replica, for that matter) is not the Titanic. A model of your brain would neither be your brain nor necessarily a brain at all, but whether it could think is a different question.
(This is the most ridiculous thing I've put on the internet, but ah well whatever)
Personally, I feel pretty strongly that consciousness is an emergent property of parallel processing among cooperating agents in a way that is completely at odds with the entire idea of what an "algorithm" is. We feel multiple things at once. Yes there is some research that provides evidence that we can't multi-task for real, that we just switch between tasks, but I am not talking about what we are able to do WITH our consciousness. Our brains are very obviously not firing 1 neuron at a time. Take vision for example. We can focus on one thing at a time, but the image itself is an emergent property in our brains of many different cells firing at the same time. It seems bound to the information feeding it in a way devoid of sequence. I know matrices are used to replicate this, and the input matrix is processed in layers to simulate the parallel processing, but i dont think that really cuts it. Time is still applied in discrete intervals, not continuous. Every algorithm has "steps" like this and I would be surprised if you can ever have enough of them to achieve the real thing. You can probably get infinitely closer from an observational standpoint, but no matter how many edges we give it, it'll still be a polygon instead of a circle so-to-speak. It'll have a rational valued circumference instead of pi, metaphorically speaking.
Whether a Turing machine is sufficient or not to replicate the amount of concurrent processing an actual cell network of millions of Neurons is capable of every moment, I can't know, but today's computers feel way off from even blurring the lines.
A Turing machine seems insufficient to replicate the mechanisms for which our consciousness appears to emerge from - even ignoring the substrate. I don't know much about quantum mechanics or quantum computers but my bet is on the qualities that differentiate them from normal computers to be critical.
Then again we could just define consciousness as a spectrum and then claim everything is on it, so a sufficiently complex loop might be enough for us to categorically claim it as conscious.
For all we know, rocks are conscious. Maybe it isn't an emergent thing at all, and there is only 1 wave of consciousness in the universe for which all matter taps into and our chaotic nature is actually at odds with that comsciousness. Maybe this creates a tension that allows for us to observe it, akin to how we can't look at ourselves but we can see our shadow. In which case we wouldnt have consciousness, consciousness would have us. Maybe we are all the same literal consciousness observing different forms of chaos, and it is the chaos, not the order, which happens to be sufficiently complex enough for us to entertain an identity. /s
Firstly, if this was the most ridiculous thing I'd ever put on the internet I'd be pretty happy.
Secondly, I agree that consciousness is predicated on emergent properties arising from the complexity that simple machines can produce (e.g. a biological brain which is composed of relatively well understood components, but who's observed operation can still baffle us). Consciousness does not seem to be a property of the components or even small groups of those components, but it seems to "fade in" as operational complexity increases (and fade out again during e.g. anaesthesia).
No. All humans are is physical state with translation of that state based on stimuli. That produces consciousness as a side effect.
There is nothing mystical about it and nothing about it that says it can't be reproduced in other, but comparable forms, in other media or types of machines.
To think that our mode of reasoning or our experience is somehow not able to be reproduced is the nonsense.
But yeah keep holding human biology on some kind of sacred pedestal.
Seeing as neither of us will prove our negative, I guess I'll just tell you I think your thinking is broken anyway.
> Just as liquidity is not a matter of 0s and 1s, nor likewise, can a CPU liquify itself by the osciliation of its electrical field, neither can it think.
That "just as" is not an argument; it marks the start of a purported analogy. Analogies are not, in themselves, arguments.
You are in no position to belittle other peoples' intuitions when you are offering this in support of the ones you prefer.
No. Wrong. Not a physical property. At all. This is completely the wrong way to describe it.
Consciousness is better described as an "emergent phenomenon". This is how serious researchers and academics define it.
Like I said in another reply to you, you really should go talk with a neuroscientist or cognitive psychologist. It sounds like you need to catch up quickly on the field of research.
QM doesn’t say the “observer” has to be conscious, or even alive for that matter. Anything that interacts with the system is an observer: be it a sheet you’re firing particles against, an electronic sensor or anything else. All QM really says is that quantum state is unknown and not fixed until you interact with it.
Things happen whether or not you’re around to see them. Things interacted millions of years ago and they will continue to interact until the heat death of the universe. The tree falls in the woods regardless of if anybody is there to witness it. No credible peer-reviewed research has ever claimed that the observer is anything more than a physical process or interaction. It’s the quantum consciousness woo woo pseudoscientists who anthropomorphise “observer” to mean a conscious observer.
> We can establish mental properties through ordinary physical means.
The thing is that’s doesn’t get you to qualia, because qualia is ultimately a dualistic concept. Physically demonstrable processes are the “accidents”, qualia are the non-physical “essence” (cf., the Catholic doctrine of transubstantiation.) That’s why the role qualia play in the AI debate is as a rejection that observable behavior of any quality can demonstrate real understanding/consciousness, which are concepts tied, in the view of people advancing the argument, to qualia.
> We can establish mental properties through ordinary physical means. Give a chicken cocaine, give a boy Valium; give a bear a lobotomy, and so on. Whatever mental property you have can be biochemically moderated across all objects which share that property.
All three of those "ordinary physical means" are inherently biological in nature. If you define intelligence as requiring biology, than sure, by definition ChatGPT is not intelligent. If you don't, it's not at all obvious what the equivalent of getting an AI high or doing brain surgery on it would be. Either way, your comment comes across as fairly condescending, and while it might not be intentional, I think it would be reasonable to give people the benefit of the doubt when they don't agree with you given that your points will not always come across as obviously correct.
Whatever way we demonstrate it, it isnt via Q&A. This is the worst form of pseudoscientific psychology you can imagine.
You're likewise neither high on cocaine just because you can type out an aggresive twitchy response.
Questions and Answers dont form the basis of empirical tests for physical properties, being neither valid tests nor reliable ones. The whole of psychology stands as the unreporducible example of this.
You can, in any case, replace any conversation with a dictionary from Q->A, turning each reply in turn into a hash table look-up.
Hash tables dont think, hash tables model conversations, thef. being a model of a conversation is not grounds to suppose consciousness. QED
Because we're interested in the underlying properties of a physical system, eg., people, and this systems happens to be able to provide extremely poor models of itself (Q&A).
We're not interested in people's extremely poor self-modelling which is pragmatically useful for managing their lives, we're interested in what they are trying to model: their properties.
The same is esp. true of a machine's immitation of "self-reports". We're now two steps removed: at least with people they are actually engaged in self-modelling over time. ChatGPT here isnt updating its self-model in response to its behaviour, it has no self-model nor self-modelling system.
To take the output of a text generation alg. as evidence of anything about its own internal state is so profoundly pseudoscientific it's kinda shocking. The whole of the history of science is an attack on this very superstition: that the properties of the world are simply "to be read from" the language we use about it.
Every advancement in human knowledge is preconditioned on opposing this falsehood; why jump back into pre-scientific religion as soon as a machine is the thing generating the text?
Experiments, measures, validity, reliability, testing, falasification, hypotheses, properties and their degrees....
This is required, it is non-negotiable. And what we have with people who'd print-off ChatGPT and believe it is the worst form of anti-science
>We're not interested in people's extremely poor self-modelling which is pragmatically useful for managing their lives, we're interested in what they are trying to model: their properties.
>Experiments, measures, validity, reliability, testing, falasification, hypotheses, properties and their degrees....This is required, it is non-negotiable.
Whoa, whoah, whoah, hold on there.
Who says that Q&A in Psychological Research doesn't involve "Experiments, measures, validity, reliability, testing, falasification, hypotheses, properties and their degrees...."
?
Where are you coming from? Your responses don't sound very scientific. You don't sound like you're even aware of the different research methods within neuroscience and cognitive psychology. Your responses sound like someone who wants to be perceived as supporting a scientific approach, but doesn't understand how to actually do these things.
This is why I quized you and gave you the chance to respond about your issues with Q&A in psychological research. You just came back with surface level platitudes. which doesn't lend much confidence to the ideathat you have anything other prejudice.
Go talk to a neuroscientist, a cognitive psychologist, you need to catch up and quick if you want to speak on these topics.
> The whole of the history of science is an attack on this very superstition: that the properties of the world are simply "to be read from" the language we use about it.
This is something where I agree with you. Interestingly, non-naturalistic analytical metaphysics supposes it can do just that.
Philosophy is continuous with science in my view and hence in what words express
That is, in the use of words. Not in words as objects nor words as mirrors --/
this is the road to non-realist spiritualism
I don't have a problem with a person who maintains a non-scientific world view and with electrical AGI mumbojbo
But
of course, few do. They think that they're empiricists, scientists and in the side of some austere hard look at human beings
This is just anti-human spiritualism , it isn't science
what makes me vicious on this point is the sense of injury in what these ideas should be about. in my own mildly aristotleian materialist religion
How awful to overcome one long veil of tears, only to drape another one on -- these people are capable of seeing past human folly, but
fall right into another kind
it's disappointing--- we're animals which are both far much more than electric switches and far
less
this new electic digital spiritualism is a PR grift which i'd prefer dead
On the one hand, you express strong support for empiricism and the scientific method, but on the other, you express strong beliefs on how things must be without offering any empirical justification for them.
> Hash tables don't think, hash tables model conversations...
They don't think, and they don't get very far in modeling conversations, either. Even the current LLMs are strictly better at modeling conversations than any hash table.
Actually since GPT is a function that takes a few thousand tokens as input and produces a deterministic probability distribution for ‘next token’, you could in theory just enumerate all possible input sequences, and precache all the corresponding results in a lookup hashtable.
That hashtable would be ridiculously larger than the weighted model that constitutes GPT itself. But it at least theoretically could exist.
GPT, for all its clever responses, could be replaced with that hashtable with no loss of fidelity. So it is not ‘better’ than a hashtable.
It is much better than a hashtable of equivalent size.
You could, in principle, use a hash table to implement what an LLM has 'learned' about token occurrences, but LLMs do not return the same response to a given prompt, while hash tables do. Consequently, Mjburgess' attempt to dismiss LLMs as mere hash tables is a flawed argument by analogy.
And even if it were a justifiable analogy for LLMs, it does not follow that it applies to Turing-equivalent devices in general. A hash table is unequivocally not a Turing-equivalent device, even though it can be implemented by one.
In fact, the more one argues that LLMs are ordinary technology, the greater the challenge they present to the notion that things like conversational language use are beyond such technologies. The most interesting thing about these models, in my opinion, lies in figuring out what their conceptual simplicity says about us.
I extended my reply before seeing your response, and my response to your reply is largely contained within that.
LLMs do not just return a probability distribution (and their prompts are not questions about probability distributions.) It would be a category error to conflate something that is part of how they work for the totality of what they do.
When you say "GPT, for all its clever responses, could be replaced with that hashtable with no loss of fidelity", where does one get all the prompts that will be given to it, in advance? A hash table does not respond to any input for which it has not already been given a response to return.
I don’t think this is obviously true… We don’t know for certain the nature of reality, and cannot positively claim that a hash table would give the same results as a sufficiently advanced predictor, because it supposes the predictor is deterministic. Assuming the predictor is advanced enough to be considered intelligent, that might not be the case.
In the same vein, we can't really "confirm" that, for example, the JWST observations are not fake. It could be that someone has produced elaborate fake data and galaxies. Taken to the extreme -- everything could be fake, even your own reality and your own memories. However, it is very, very reasonable to assume those things are not fake.
Clearly one basic principle of science, reason, etc.. is to assume some kind of predictability or regularity unless we have reason to believe otherwise. I think it will turn out we should deal with qualia the same way: even though in a very strict sense it is not observable by other entities, we can use our own experiences to assume others have similar ones, and to build a scientific and philosophical understanding of it from the principles of generalization and from there accepting the insights and contributions of other scientists who are also sentient as far as we know. This is a new and necessary paradigm of science so that we can understand things we really need to understand profoundly. This is as true in cosmology as in the study of consciousness. [1]
It doesn’t need to be verified. I am one amongst many creatures with slightly similar qualities, the bounds of which I can determine through experiments. Or, my experiments are all being thwarted by unseen parties bent on confounding my understanding of the world, and this is all a simulation where not one of you matter but the npcs all pretend to be offended when anyone asserts it.
In that case this is all probably pointless and I should just die rather than play the game. So by continuing to live I chose the assumption that the rest of you live, or that at least we will all continue to pretend that is so. Any state in between those two options is something I’ll only know if someone turns the thing off, but doesn’t turn me off at the same time, which is a huge presumption. So fuck it, you’re all sentient.
I suspect Hinkley was just rejecting, as unhelpful, the deep skepticism of arguments based on pointing out that we cannot prove that anyone else has an inner, subjective mental life like ours.
If that's Hinkley's point, I agree; it seems tendentious to use that position in the process of arguing for or against specific claims about what the mind either is or could never be - but, by the same token, I don't take that 'skepticism of skepticism' as grounds for categorically rejecting ideas like the simulation hypothesis.
Put it this way: we are not clear really on what the difference is between ‘experiencing qualia’ and ‘receiving and responding to external stimuli’
Like, you might be convinced you possess ‘qualia’, and you might believe the same applies to me, or a dog, or a mouse… or a fish… but what about an ant? A plant? A bacterium? A neuron? A piece of semiconductor?
Somewhere along that continuum you probably say ‘yeah, there, that thing is experiencing qualia’. But why there? Why anywhere?
Some people think that whether something experiences qualia depends on it having the right kind of complexity. For those people, looking along that continuum, they might just say that ALL those things experience qualia, but that the richness of that experience varies according to the appropriate complexity.
For some of us, though, whether or not a thing experiences qualia depends primarily on whether there is a mental substance involved. A computer has no mind (we suspect), and so even if its complexity (of the alleged 'right sort') exceeds that of the human, there is still no experiences.
The main point I'm making here is that trying to draw attention to this continuum is only going to be a persuasive argument to those who already think (or are inclined to think) that the ability to have experiences of qualia comes about merely from having the right kind of complexity (e.g., 'receiving and responding to external stimuli').
I am inclined to think that other humans and other non-human animals have experiences, but I don't think that's merely because they're complex enough systems (of the right sort of complexity).
Each view of the world has things it explains well, and things it struggles to explain. Some views do a better job overall than others.
In terms of idealist and dualist views of reality, the claims that there's fundamentally just minds, or fundamentally both minds and physical stuff, are very likely to entail that there is some kind of ensoulment that happens. Perhaps it's required for these views.
While I lean towards a type of idealist view myself, and I think overall it does a better job of explaining various matters than a physicalist view, how exactly ensoulment works is one of the places where these views are at their weakest. I don't think there's any contradiction here or problem for a view like mine, I just think that the physicalist account does a nicer job of explaining things in this specific corner. But that point in physicalism's favour here isn't enough to balance out the places it doesn't do so well (e.g., its complete inability to even begin to explain qualia).
The core point of my post though was that pointing to the continuum of complexity is an argument that would only have weight with people who already have a particular belief in common with you -- that is, that physical complexity of the right sort is somehow relevant to the question of whether something has experiences.
The whole point of introducing the (icky) "qualia" concept is that it's not the same thing as appearing alive to external observers.
As I understand things, we have no way of knowing the answer. So there's no point in assuming in either direction (unless that makes you feel more comfortable).
Personally I avoid being confident in something in the almost provable absence of any evidence. Feels more hygienic to reply "don't know" to this whole problem than to waste time trying to find an answer (as I'm hopelessly outmatched by the cursed nature of the problem).
Isn't qualia defined as whatever it is that humans experience? Without getting into solipsism and p-zombies (aka "NPCs"), humans possessing qualia seems tautological.
My understanding is that, within the academic literature, nearly everyone accepts qualia in as much as qualia is simply the subjective character of an experience/sensation. The disagreement is over how exactly to cash that out, and whether - and to what extent - the non-physical comes into play.
It’s a technical philosophical term for sensations like color and smell, which denotes they are subjective experiences of sensory information, as opposed to objective properties out there in the world.
Technically, its the technical philosophical term for subjective experiences generally.
Any discussion about it, however, is fairly deep navel-gazing, because it excludes and cannot be distinguished by any objective, testable, material effect.
> because it excludes and cannot be distinguished by any objective, testable
So, if you give people the “Mary the colour scientist” thought experiment, the vast majority of people give the answer consistent with the existence of qualia. That is something “objectively measurable”.
Furthermore, I think it is important when someone asks the kind of question you are asking, to inspect what “objective” and “measurable” actually mean. Ultimately, something is “measurable” if some human has the subjective experience of performing that measurement (whether directly or indirectly). Similarly, something is “objective” if some human has the subjective experience of communicating with other humans and confirming they make the same measurement/observation/etc. To object to subjective experiences (qualia) on the ground that they are not “objective” or “measurable” is self-defeating, because “objective” and “measurable” presume the very thing being objected to.
> So, if you give people the “Mary the colour scientist” thought experiment, the vast majority of people give the answer consistent with the existence of qualia.
The existence of qualia, and a change in it, is an assumption within the premise of the question, there is no answer to it that is not consistent with the existence of qualia: both “Mary does not gain new knowledge by direct experience of color” and “Mary does gain new knowledge by the direct experience of color” are consistent with, and depend on, qualia, to with, the direct experience of color.
It is not an unknown effect (nor one that tells anything about the existence of qualia) that framing a qestions with a concept in the premise is a good way to get people to accept the premise and focus on answering the question within it. In fact, a very important propaganda technique involves leveraging this by working to get a question incorporating a proposition you wish to get accepted as part of its premise into public debate so that, whatever position people take on the questions, your interests are advanced because merely getting the question into the debate leads people to accept its premises.
Ask people: "is there a subjective experience of seeing a reddish patch, which is distinct from whatever physical processes might be going on in the eye and brain when one sees that reddish patch?" I think most people would say "yes". I don't think that's a "leading question". And, the "Mary the color scientist" thought experiment is just adding a bit more color (excuse the pun) to the question.
> In fact, a very important propaganda technique involves leveraging this by working to get a question incorporating a proposition you wish to get accepted as part of its premise into public debate so that, whatever position people take on the questions, your interests are advanced because merely getting the question into the debate leads people to accept its premises.
If there is "pro-qualia" propaganda in the public debate, I think there is at least as much "pro-materialism" propaganda as well.
> something is “objective” if some human has the subjective experience of communicating with other humans and confirming they make the same measurement/observation/etc.
Me and a dog both agree that steak tastes great.
Me and a plant both agree that there's warmth and light that comes from the sun
Me and the ocean both agree that the moon is out in the sky and goes round in a circle
So... the ocean has subjective qualia relating to the moon's gravitational pull?
In what way do you and plants and you and the ocean 'agree'? I'm not sure how to make sense of these claims. I'm not even sure that you and the dog agree on this matter, but I can make more sense of that.
Just poking at the 'human' part of the definition.
Because if we want to ever be able to answer a question like 'does this AI system experience qualia', we need a definition that doesn't rely on 'well, qualia are a thing humans have, so... no'
When I have a social-emotional interaction with another human being, of a certain quality, that interaction produces in me the conviction that they must be really conscious, as opposed to a psychological zombie. Of course, I only personally have that kind of interaction with a tiny subset of all humans, but I generalise that conviction from those humans which provide its immediate justification, to all humanity
Which means, if there was an AI with which I could have that kind of interaction, I would likely soon develop the conviction that it was also really conscious, as opposed to a psychological zombie. Existing systems (for example, ChatGPT) don't offer me the quality of social-emotional interaction necessary to develop that conviction, but maybe some future AI will. And, if an AI did create that conviction in me, I likely would generalise it – not to every AI, but certainly to any other AI which I believed was capable of interacting in the same way
The whole idea of p zombies is that they can't be distinguished from conscious entities. Same inputs as a conscious mind, same outputs as a conscious mind, same behavior etc yet not conscious.
When I am convinced that my daughter is a real conscious person – that conviction isn't just based on dispassionate observation of her outputs in response to various inputs, it also has an emotional dimension. The thought of her being a p-zombie offends my heart, which is part of why I reject it. I haven't yet met an AI for which the thought of them lacking real consciousness offended my heart – I don't know if I ever will, but if I did, I would be convinced it was really conscious just as my daughter is.
Some will object that it is irrational to allow one's emotions to influence one's beliefs. I think they are wrong – certainly, sometimes it is irrational to allow one's beliefs to be swayed by one's emotions, but I disagree that is true all the time, and I think this is one of those times when it isn't.
Growing up, we had dogs. I had a very strong emotional bond with our dogs – so, if my emotional bonds with my son and daughter are good reasons for me to be convinced that they are both really conscious, to be consistent I'd have to say our dogs were really conscious too. And, since I generalise from my conclusion that my children are really conscious, to the conclusion that other people's children are also really conscious, I'm also going to generalise the conclusion from our dogs to other people's dogs – and also cats – I'm not a cat person, I find them much harder to relate to than dogs, but I recognise other people feel rather differently about them than I do.
So, in my mind, animals which humans have as pets, and which are capable of forming social-emotional bonds with humans, are really conscious. I'm less sure about pet species which are less social, since emotional bonds with them may be much more unidirectional, and I think the bidirectional nature of an emotional bond plays an important part in how it helps us form the conviction that the other party to that bond is really conscious.
What about ants, fleas, wasps, bees, bacteria? I don't believe that they are conscious, I suppose I'm inclined to think they are not. But, I could be wrong about that. I can't even rule out panpsychism (everything is really conscious, even inanimate objects) as a possibility. If I had to bet, I'd bet against panpsychism being true, but I can't claim to be certain that it is false.
Where do we draw the boundary then between "really conscious animals" and "p-zombie animals"? I think capacity for social and emotional bonds with humans (or other human-like beings, if any such beings ever exist–such as intelligent extraterrestrial life, supernatural beings such as gods/angels/demons/spirits/etc, or super-advanced AIs) is an important criterion – but I make no claim to know where the exact boundary lies. I don't think that "we don't (or maybe even can't) know the exact location of the boundary between X and Y" is necessarily a good argument against the claim that such a boundary exists.
you know you possess qualia, if you did you would think it reasonable to assume that at least some of the species you come from, which exhibits many of the same characteristics in thought and body, probably also possess it, unless you believe yourself to be a highly atypical example of your species.
If you're not sure if you possess qualia, we're back to Descartes.
You don’t experience inner dialog or visualizations? Some people don’t, but I assume you recall dreaming. Opponents of qualia like Dennett argue against the movie in the head, but dreams sure aren’t coming from the external environment.
I think qualia is a good term in the following sense:
An analogue question would be "why do things exist?". There can be no answer to this question. We can of course come up with theories that explain why certain things exist. But never why something even exists at all. Whatever reason we propose, reason, it again might needs to exist in order to be an acceptable answer.
Qualia seems to be similar: They name precisely what is subjective about experience and therefore cannot be fully turned objective. We can of course develop theories of how certain experiences arise but never break down this barrier.
So I find it a bit short sighted to simply say "there is no such thing as qualia".
My preferred view is to think of both "qualia" and "existence" as a koan: They are very nonsensical terms but lead to interesting questions.
> An analogue question would be "why do things exist?". There can be no answer to this question. We can of course come up with theories that explain why certain things exist. But never why something even exists at all.
Here's an argument for the conclusion "it is necessary that at least one thing exists":
1) For some proposition P, if it is impossible for us to concieve of P being true, and if that impossiblity is inherent in the very nature of P, and not in any way a product of any scientific or technological limitation, then P is necessarily false
2) It is impossible for us to conceive of the proposition "nothing exists at all" being true
3) Our inability to concieve of "nothing exists at all" is inherent in the very nature of "nothing exists at all", as opposed to being somehow a product of our scientific or technological limitations. We have no good reason to believe that any future advances in science or technology will make any difference to our inability to conceive of "nothing exists at all" being true
4) Hence, it is necessarily false that "nothing exists at all"
5) Hence, necessarily, at least one thing exists.
Now, others may disagree with this argument, but I personally believe it to be sound. And, if we have a sound argument that "necessarily at least one thing exists", then that proposition, and the argument used to prove it, constitute a good answer to the question "why do things exist?"
I find your first point to be similar (almost the opposite?) of Descartes' Ontological argument [0].
> 1. Whatever I clearly and distinctly perceive to be contained in the idea of something is true of that thing.
Our (in)ability to imagine or reason about something does not make it (un)real or (un)true. Your version would be like a dolphin claiming algebra can't be real because they can't imagine a system of equations, let alone solving one.
Further... You already know that something exists, but that's a pretty unbelievable fact if you stop to think about it. You probably wouldn't believe it if it weren't so obvious. Why? Well, can you imagine how things (or the very first thing) came to exist? Or can you imagine the concept of things always having existed? What does that even mean? They are both beyond comprehension, and yet one of them is true.
> Your version would be like a dolphin claiming algebra can't be real because they can't imagine a system of equations, let alone solving one.
Note I explicitly said "if that impossibility is... not in any way a product of any scientific or technological limitation". Understanding mathematics to be a science (albeit a formal science rather than a natural one), a dolphin's inability to understand algebra is an example of "scientific or technological limitation"; therefore, my principle does not apply to that case, and your counterexample cannot be an argument against the principle when the principle as worded already excludes it.
> Well, can you imagine how things (or the very first thing) came to exist? Or can you imagine the concept of things always having existed? What does that even mean? They are both beyond comprehension, and yet one of them is true.
There are more than just two possibilities here. When you say "things always having existed", that can be interpreted in (at least) two different ways – an infinite past, or circular time (as in Nietzsche's eternal recurrence). Or, another possibility would be that the universe originated in the Hartle–Hawking state, meaning that as we approach the beginning of the universe, time becomes progressively more space-like, and hence there could be no unique "first moment" of time – in the beginning, there was no time, only space, and then (part of) space gradually becomes time, but in a continuous process in which there is no clear cut-off point between time's existence and non-existence.
Can I comprehend these possibilities? I feel like I can, for some of them–maybe some of them are more comprehensible to me than to you. But, those which I cannot comprehend, is that because the theory itself is inherently incomprehensible, or is that "in any way a product of any scientific or technological limitation"? I can't say for sure it isn't the later. For example, I find it really hard to comprehend the Hartle-Hawking proposal, but I don't have a good understanding of the maths and physics behind it, so it seems entirely possible that my difficulties in comprehending it are due to my lack of ability in maths and physics, rather than the very nature of the idea itself. Similarly, my intuition is repelled by the notion of an infinite past, but is it possible I'd view the matter differently if I had a better understanding of the mathematics of infinity? I can't completely rule that out.
By contrast, I have no reason to think that my inability to conceive of "nothing exists at all" is due to any limitation of my understanding of mathematics and physics. What mathematics or physics could possibly be relevant to it? There isn't any, and there is no reason to think there ever could be any. So, I say my principle clearly applies here, but not in the "how did things begin to exist" case which you raise.
It doesn't really matter _why_ the dolphin doesn't believe algebra is possible, their subjective experience of being unable to imagine something is all it takes. The dolphin doesn't think it's a scientific or technological limitation, they just don't think it's possible.
What about "not existing" is impossible as an inherent property of nothingness?
And why do you think your ability to believe or picture something has any impact on whether it's real or true?
> It doesn't really matter _why_ the dolphin doesn't believe algebra is possible, their subjective experience of being unable to imagine something is all it takes. The dolphin doesn't think it's a scientific or technological limitation, they just don't think it's possible.
By the terms of the principle I proposed, it does matter. Now, maybe you are arguing my proposed principle is wrong in saying that matters – but, I don't know if considering a dolphin really helps get us anywhere in that argument: dolphins are – as far as we know – incapable of the kind of abstract conceptual thought necessary to even consider the question "is it possible that nothing could have existed at all", so why would what they can or can't imagine be relevant to that question?
> What about "not existing" is impossible as an inherent property of nothingness?
Essentially what I am arguing, is that it is inherent to the very idea of existence that at least something exists. The idea of some particular thing not existing is coherent, but the idea of nothing existing at all isn't.
> And why do you think your ability to believe or picture something has any impact on whether it's real or true?
Consider a statement like "the square is both entirely black and entirely white", or "1 + 1 = 3". For such statements, it is both true that (a) it is impossible that they are true, and (b) it is impossible for us to imagine what it would be like for them to be true. Now the question is, is it merely coincidentally true that both (a) and (b), or are they both true because their truth is related in some way? To me, the later seems far more plausible than the former. In which case, if we know (b) is true of a statement, that gives us at least some reason to think that (a) might be true of it as well.
Of course, we are aware of specific cases in which (b) is true without (a) being true – but, all such known cases involve limitations of mathematical/scientific knowledge or technology. Is it possible for (b) to be true, yet (a) being false, for a reason unrelated to limitations of mathematical/scientific knowledge or technology? Nobody has proposed any such plausible reason, so I think it is reasonable to conclude that there probably isn't one. Hence, if (b) is true of a proposition, and we have no good reason to suppose our inability to imagine is due to limitations of our scientific/mathematical knowledge or technology, then that's a good reason to believe that (a) is at least probably true with respect to that proposition.
> They are, to the extent not random, coming from the material universe, whether from the part arbitrarily deemed “internal” to the “observer” or not.
What is this "material universe" of which you speak? As an idealist, I'm inclined to say that the "universe" is a set of minds whose qualia cohere with each other (not perfectly, but substantially). “Physical” objects, events, laws, processes, etc, are patterns which exist in those qualia.
> What is this “material universe” of which you speak?
The part of my qualia which can be reduced consistent, predictive patterns (and, for convenience, a set of concepts which represent patterns, an explanatory models for patterns, within that.)
The idea that there is anything that meaningfully exists outside of my qualia, is the lowest-level of those models, and that that includes other entities which have qualia of their own, is a high-level model (or maybe, more accurately, a conjecture) built on top of those models which might have consequences for, say, my preferences for how I would like the universe to develop, but interesting lacks predictive consequences – its a dead-end within the composite model.
On a fundamental level, outside of any beliefs about the “reality” of the patterns or explanatory models of the “material universe”, objective questions are ones which have consequences on the expectations within that set of patterns, whereas subjective ones are those which do not.
> The idea that there is anything that meaningfully exists outside of my qualia, is the lowest-level of those models, and that that includes other entities which have qualia of their own, is a high-level model (or maybe, more accurately, a conjecture) built on top of those models which might have consequences for, say, my preferences for how I would like the universe to develop, but interesting lacks predictive consequences – its a dead-end within the composite model.
Are you arguing that idealism is a predictive dead-end? I don't think it is any more of a predictive dead-end than materialism is.
Scientific theories are conceptual frameworks which can be used to predict future observations. As such, they make no claims about the ultimate ontological status of their theoretical constructs. Materialists propose that those theoretical constructs (or at least some subset of them) are ontologically fundamental, and minds/qualia/etc must be ontologically derivative. Idealists propose that minds/qualia are ontologically fundamental, and those theoretical constructs are ontologically derivative. Neither is science, although both are philosophical interpretations of science – I see no reason why science (correctly interpreted) should be taken as preferring one to the other.
Dreams happen in the electrochemical activities of brains. We can detect when dreams are occurring in the brains of sleeping people.
That spontaneous activity inside the brain is ‘experienced’ by the person whose brain it is in, in a similar way to activity caused by external stimuli, doesn’t seem to say anything about whether dreams are evidence of some higher level of ‘qualia’ beyond just.. brain activity is consciousness.
> We can detect when dreams are occurring in the brains of sleeping people.
A scientist has the subjective experience (qualia) of observing a person sleeping with certain scientific equipment attached to them, the subjective experience of observing that equipment produce certain results, the subjective experience of waking the sleeper and asking them if they were just dreaming and getting an affirmative response, etc. If the scientist claims that "dreams are brain activity", their claim is referring to those subjective experiences of theirs, and has those subjective experiences as its justification. And that's all fine – there is no problem with any of this from an idealist viewpoint. "Brain activity" is a pattern in qualia, "dreaming" is another pattern in qualia, some correlation between them is a third (higher-level) pattern in qualia. It's qualia all the way down.
But, to then use those subjective experiences as an argument against the existence of subjective experiences is profoundly mistaken.
I'm not sure if I'm missing something here, but the fact that I can write my thoughts/thought process down in a form that other people can independently consume and understand seems sufficient proof of their existence to me.
that's why I added the second clause: " thought processes similar to those that humans subjectively experience". Because personally I suspect that consciousness, free will, qualia, etc, are subjective processes we introspect but cannot fully explain (yet, or possibly ever).
LLMs can do chain-of-reasoning analysis. If you ask, say, ChatGPT to explain, step by step, how it arrived at an answer, it will. The capability seems to be a function of size. These big models coming out these days are not simply dumb token predictors.
I suspect that a lot of AI researchers will end up holding the exact opposite position to a lot of philosophers of mind and treat AGIs as philosophical zombies, even if they behave as if they are conscious. The more thoughtful ones will hopefully leave the door open to the possibility that they might be conscious beings with subjective experiences equivalent to their own, and treat them as such, because if they are then the moral implications of not doing so are disturbing.
I’m happy to “leave the door open,” i.e., I’d love to be shown evidence to the contrary, but:
If the entity doing the cognition didn’t evolve said cognition to navigate a threat-filled world in a vulnerable body, then I have no reason at all to suspect that its experience is anything like my own.
edit: JavaJosh fleshed this idea out a bit more. I’m not sure if putting ChatGPT into a body would help, but my intuitive sympathies in this field are in the direction of embodied cognition [1], to be sure.
> If the entity doing the cognition didn’t evolve said cognition to navigate a threat-filled world in a vulnerable body, then I have no reason at all to suspect that its experience is anything like my own.
An AI model needs to evolve its cognition to successfully navigate its training environment for survival.
Shouldn't that satisfy your criteria for suspecting its experiences to be like your own?
I don’t believe it does— not yet, at least. Their “world” is staggeringly different from our own at present, and in important ways.
For example, the AI models we have now will happily run forever if supplied with external power.
That sounds quite different from my experience of the world!
The AI model doesn’t ever get tired, or bored, or hungry, or anxious, or afraid.
If a model emerged that started procrastinating, say, or telling jokes for fun, or shutting down for long stretches due to depression— then I’d start to consider that maybe it’s got an inner world like mine.
> For example, the AI models we have now will happily run forever if supplied with external power.
A human will run forever as well if supplied food (external power), a new body (computers degrade) and genetic data protection (hard drives corrupt) :)
While AI models may not be quite there yet, their world seems fairly similar to ours in the important ways of environmental constraints and survival pressures
Evolution per se isn’t the important part. I apologize if my wording gave you that impression.
The only reason our brains exist at all is to control our bodies. There are no conscious entities we know of that are just brains in vats.
Why would a being that has no need to maintain biological homeostasis experience something like hunger, tiredness or fear?
That’s what I mean when I say I’ve no reason to believe a thinking machine that has none of my motivations and limitations would ever have a “subjective experience” like my own.
That is, of course, unless we succeed in reverse-engineering human consciousness and somehow deliberately program the machine to experience it.
I don’t doubt that it’s possible for a human-engineered thinking machine to develop a subjective self-experience.
It just seems clear to me that, if such a scenario arose, I would have absolutely no idea what that would be like for the machine.
Its needs and circumstances (and, therefore, its behavior and cognitive development) would be utterly different from my own in many non-trivial ways.
Given this, I can’t think of any logically rigorous argument to support the commonly-made assumption that a given sentient machine must inevitably develop something like the physical aggression or the capacity for emotional pain displayed by social animals which must compete for scarce resources in the physical world.
Modern AI software lacks a body, exempting it from a wide variety of suffering. But also of any notion of selfhood that we might share. If modern software said "Help, I'm suffering" we'd rightly be skeptical of the claim. Unless suffering is an emergent property (dubious) then the statement is, at best, a simulation of suffering and at worst noise or a lie.
That said, things change once you get a body. If you put ChatGPT into a simulated body in a simulated world, and allowed it to move and act, perhaps giving it a motivation, then the combination of ChatGPT and the state of that body, would become something very close to a "self", that might even qualify for personhood. It is scary, by the way, that such a weighty decision be left to us, mere mortals. It seems to me that we should err on the side of granting too much personhood rather than too little, since the cost of treating an object like a person is far less than treating a person like an object.
> he suspects that we will see an emergent reasoning property once models obtain enough training data and algorithmic complexity/functionality
What "reasoning property"? And on what basis? This just sounds like magical thinking.
> without actually knowing if they posses qualia
If you're a materialist, you already believe they don't because matter in the materialist conception is incompatible with qualia, by definition. Some materialists finally understood that and either retreated from materialism or doubled down and embraced the Procrustean nonsense that is eliminativism.
Yeah Norvig seems to be vindicated by deep learning models as a sophisticated markov chain which can give pretty good results purely through probability.
Even norvig acknowledges that there are edges cases which probability will have a hard time on, especially untrained. Chomsky tends to focus on these with his focus on emulation instead of markov simulation.
Side question: how do we know if humans possess qualia?
On the other hand, I think by definition we can be sure that a ML thought process won't ever be similar to a human thought process (ours is tied up with feelings connected to our physical tissues, our breath, etc).
You presumably know that you, yourself, posess qualia based on your own experiences. I certainly do. But there's no way to know that other humans do, at least not via empirical or scientific means. It's a safe assumption that since you have qualia that most/all other humans do, but it's entirely possible that you're the only one and I'm some Chinese room style simulation of qualia (or vice versa).
No, I don't know that I myself possess qualia. I agree with Dennett that "qualia" is a philosopher invention that is incompatible with neuroscience. IMO "qualia" is just a rhetorical trick designed to justify human superiority.
Would you mind if we ripped out your brain and replaced it with a superior computer that could control your body? There are no disadvantages for you, the brain would control you to create offspring, so it is the best thing for you to do. It would be nicer to your friends, a better parent for your kids, do better at work etc.
If you wouldn't want to do the above, the reason not to do it would be that you want to keep your current qualia. There are no other reasons not to do it. So, if you don't want above, it means that you know what qualia is, you are just playing ignorant here.
Making left turns...stopping at red lights...these are success/failure criteria.
In contrast, having a favorite food, or an opinion on politics, or a preference on what should be considered the best movie from the 1990's, or what kind of music you want to blast on your stereo to listen to on your drive as you make your left turn...these are not success/failure criteria.
> Side question: how do we know if humans possess qualia?
I think there are some people who don't possess qualia, or maybe don't notice that they have it. There was a reddit post years ago from a guy who said he only became self-aware in adulthood, like a light switched on in his brain one day. And it wasn't "becoming self-aware" in a metaphorical sense (like learning responsibilities or how you are perceived by otheres), it was described as the real-deal: consciousness of subjective experience. One day he didn't have it, the next day he did.
There are also various accounts of people losing subjective consciousness while nevertheless awake and able to walk and clothe and feed themselves -- neurological conditions, "ego death".
Another reason I believe this comes from reading accounts of aphantasia. Such people can go their whole lives not realizing that most people can see images in their minds eye. A consistent theme is their assumption that phrases like "imagine", "see in your minds eye", "picture this: ...", "flashed through my mind", "cannot unsee", etc were just metaphors, like how when we say "take the bull by the horns" we're not talking about a literal bull or literal horns. To them it was a shock that people really had pictures in their minds, because they had never experienced such a thing.
(Reiterating for anyone reading this who is confused: yes most people can literally see imaginary things, overlaid on top of the normal visual field. If you find this surprising, you have aphantasia).
So if this supposedly universal subjective experience can be absent in some people, and they don't realize it because they think the language is metaphorical, then perhaps that could happen for everything else. Or maybe not everything else, but some kind of spectrum, where some people only have very weak, barely noticeable qualia, and others get it much stronger. In fact, if we suppose a naturalistic origin of consciousness, I think it has to be the case that it's a spectrum. Nature rarely produces sharp binaries. I think I have experienced this myself during certain kinds of dreams, where the self seems fractional, ghostly. Maybe some people live like that all the time. How would we know? Then at the other end of the spectrum, there might be people with intense subjective everyday experience. There are accounts of drug trips where everything just felt more, in some indescribable way -- what if some people don't need drugs for that?
And likewise, for any qualia-lackers or consciousness-lackers who are reading this: yes, qualia is a real thing. It's not a metaphor or a language-game. It does actually feel like something to exist.
I think (1) it is unlikely that the presence or absence of qualia perfectly lines up with being human.
I strongly suspect (2) that at the very minimum other humans who report thinking like me also have qualia. Otherwise the word wouldn't have been invented.
As (3) minimal definitions of "what's a real person anyway" have led to most of the genocides and other crimes against humanity throughout history, I prefer to act as though all humans have qualia; but that's an argument about how to act under uncertainty, not about what is.
Given (1) and (3), I also assume other animals have qualia and that the meat any dairy industry is as bad as if the same things were done to humans. (Unfortunately, I can't seem to give up cheese yet).
We have seen AI agents demonstrate the creation of language to coordinate group actions, so (2) is possible in principle, but I don't know how we can tell the difference between an AI which accurately self-reports having qualia and one which just says it does but actually has as much as the web browser you're reading this comment in:
While I do think models are, can be, and likely must be a useful component of a system capable of AGI, I don't seem to share the optimism (of Norvig or a lot of the GPT/AlphaCode/Diffusion audience) that models alone have a high-enough ceiling to approach or reach full AGI, even if they fully conquer language.
It'll still fundamentally only be modeling behavior, which - to paraphrase that piece - misses the point about what general intelligence is and how it works.
If it swims like a duck, and it quacks like a duck, then it is good enough to be a Duck.
Well you can argue, ducks lay egg too, then we need to solve and code that too.
Its always better than previous. No body is creating life here, but its an attempt to derive intelligence, and seeing it come to this point seems we are so far right on track and quite far from where it started.
> As you can see, he has a measured and empirical approach to the topic.
He still hasn’t corrected his inaccurate comments about pro-drop after all these years. (A bit of a hobby horse of mine: earlier comment here: https://news.ycombinator.com/item?id=28127035)
I'm not the first to say it, but the distinction over whether models do any "actual reasoning" or not seems moot to me. Whether or not they do reasoning, they answer questions with a decent degree of accuracy, and that degree of accuracy is only going up as we feed the models more data. Whether or not they "do actual reasoning" simply won't matter.
They're already superhuman in some regards; I don't think that I could have coded up the solution to that problem in 5 seconds. :)
Reasoning is meaningfully distinct from what the models do now. You can ask ChatGPT to multiply two smallish natural numbers and often it will give the correct answer. Now look what happens if you ask it to multiply two very large numbers:
> To multiply two large numbers, such as 3146746135164684 times 468473136468433, we can use the traditional multiplication algorithm. This algorithm involves breaking the numbers down into smaller groups of digits, multiplying each group by the corresponding group of digits in the other number, and then adding up the products to find the final result.
> Here is an example of how to multiply 3146746135164684 times 468473136468433 using the traditional multiplication algorithm:
> Break the numbers down into groups of digits. For example, we could group the digits of 3146746135164684 as 314, 6746, 135, 1646, 84 and group the digits of 468473136468433 as 4684, 7313, 6468, 433.
> Multipply each group of digits by the corresponding group of digits in the other number. For example, we would multiply 314 times 4684 to get 1470196, 6746 times 7313 to get 49470158, 135 times 6468 to get 873980, 1646 times 433 to get 708338, and 84 times 1 to get 84.
> Add up the products to find the final result. In this case, the final result would be 1470196 + 49470158 + 873980 + 708338 + 84 = 146845556.
> Therefore, the result of multiplying 3146746135164684 times 468473136468433 using the traditional multiplication algorithm is 146845556.
It's not just that the answer is wrong, is that it's complete nonsense.
Reasoning is a style of thinking that scales. You may be more likely to get the wrong answer in a very long chain of reasoning because at every step you have a nonzero chance of making a mistake, but the mistake is identifiable and explainable. That's why teachers ask you to show your work. Even if you get the answer wrong, they can see at a glance whether you understand the material or not. We can see at a glance that ChatGPT does not understand multiplication.
I don't think I buy this argument. ChatGPT seems to understand how to reason about a large multiplication the same that a 6 or 7 year old might, and I would expect a 6 or 7 year old to make similarly large errors. No one claims that 6 or 7 year olds are unable to reason.
I took the opportunity of seasonal proximity to family and tried this question on a small selection of 6--8 year olds, and I received one correct response (accompanied by proudly showing the correct application of long multiplication on a big piece of paper) along with two "I don't know, what is it?"
None invented their own algorithm and confidently claimed it to be the way.
Interesting, but I wouldn't draw the conclusion that children don't sometimes make up plausible sounding but totally wrong explanations for things because they definitely do. Anyone with children knows this.
Though they are definitely better at saying I don't know than ChatGPT et al, which have basically been trained to never admit they don't know and always bullshit instead.
Yeah, in the original gpt-3 paper one of the more interesting bits was that it made similar off by one errors a human would make when doing arithmetic (and they controlled for memorized test data).
> seems to understand how to reason about a large multiplication
How so?
It is a very impressive accomplishment what a large language model can do. It can piece together coherent text from a Google-sized corpus. But I don't thinking describing the process as "reason" is a useful description.
For a subject as purely logical as AI is, it sure seems to draw a in a lot of wishful thinking.
By that logic a parrot should be doing calculus within 10-20 years of its first word. Unlikely...
It's a fallacy to anthropomorphize a pile of linear algebra, relate it to a scale of human development, and extrapolate that AI is on a similar trajectory of progress/potential.
> By that logic a parrot should be doing calculus within 10-20 years of its first word. Unlikely...
If a parrot's brain grew exponentially every year for 20 years, then I might expect it to. :)
> It's a fallacy to anthropomorphize a pile of linear algebra, relate it to a scale of human development, and extrapolate that AI is on a similar trajectory of progress/potential.
By that logic a parrot should be doing calculus within 10-20 years of its first word. Unlikely...
I don't know about parrots, but if you could take a corvid and grant it the same rate of improvement in brain size and training input that these models are receiving, I wouldn't be the least bit surprised to see it doing calculus within 10-20 years.
Size isn't everything. It's not about the brain growing - it's about the teaching methods. Right now, I think it's still an open question whether current architectures and objectives are for sure capable of reasoning/AGI/etc or not. If they're not, then scale itself isn't the answer.
Hardly. Go ask a 6 or 7 year old to write down their "reasoning" at multiplying 203948029384029384 by 7834928734982374982374 and, assuming you can even get them to do it, I argue that the words they write will be very similar to what ChatGPT wrote. Perhaps the vocabulary will be less refined, but the essence will be similar.
In any case, the point is that it's an objective thing anyone can observe: compare the two outputs. See if they differ.
> This is purely ignorant magical thinking.
I counter that you ascribe "magical" thinking to the human condition.
But by that standard, I could just ask GPT to write code to multiply the numbers. It's not even trying you be an embodied agent with a hand able to type onto a calculator.
GPT is remarkable to me not for its failings, but because of how far it can get when it's only trying to win the game of "predict what token comes next".
To be more accurate, the 7 year old will tell you they can't do it and suggest the use of a calculator.
GPT simply gives you a complete hallucination. This can be gotten around imperfectly with prompt engineering to get GPT to admit it doesn't know. But still, hallucinations are pretty dangerous failure modes in many applications, and definitely something we don't want in a future iteration. I don't know how much of a change it is to create a GPT without hallucinations, but I suspect that it is non-trival.
This post would be more compelling if you multiplied those numbers out yourself to prove that a human can do it. I myself don't feel comfortable to multiply those numbers. I'd say your definition of reasoning excludes most humans, who would simply respond to such a prompt with "nope, can't do it".
While all the rocket and space nerds I follow hold Musk in high regard for everything related to SpaceX, all of the civil engineering nerds I follow think TBC is a deadly disaster waiting to happen and that hyperloop is pointless, while all the neuroscience nerds I follow think Neuralink is kinda meh.
I believe there is a meaningful difference between an answer that's wrong and an answer that's nonsense.
If I were to multiply those numbers, it's likely that I'd get the wrong outcome because it's a long computation and at every step I have a nonzero chance of making a mistake. My solution, written out, would look like the correct algorithm, but with computational mistakes along the way. My result could be quite far off, but it would be in roughly the same order of magnitude. If I'd get an answer that's not roughly in the order of magnitude that I expect, I would spot it and -- if so motivated -- start over. If you'd look at my work, you would be able to conclude that I understand multiplication but made a computational error.
ChatGPT also diligently describes its work, and it's just nonsense. The final result is smaller than either of the factors. The algorithm it uses makes no sense. On smaller numbers, the algorithm also doesn't make sense, but it can ballpark the outcome. Therefore, it seems to be mostly using estimation rather than computation to get to the answer, and that estimation breaks down completely for very large numbers.
Sub-token manipulation is a known weakness of the models; they are fed blocks of characters, not individual characters, and so they underperform on tasks that require fine-character-level manipulations.
But in that case I would have wanted to see it present a number that's at least of the right order of magnitude, even if all individual digits were incorrect. A human can adapt to their known weaknesses.
Optical illusions expose fundamental issues in human perception, which we generally can't adapt to. This is similar.
[edit, fwiw]: Also consider that humans can't perceive ultrasonic sound frequencies, or light outside a certain frequency range... We have eventually invented ways to perceive beyond our physical limitations, but no amount of individual adaptation can overcome these boundaries.
There are /other/ arguments that ChatGPT doesn't reason well, but the character-level manipulation examples are insufficient.
"Do the models do any actual reasoning" is the difference between your ML blackbox having a child's level of understanding of things where it just repeats what it's been trained on and just "monkey see monkey do" it's way to an output, or whether it's actually mixing previous input and predicting and modeling and producing an output.
There's a bunch of famous research that shows a baby and toddlers have basic understanding of physics. If you give a crawling baby a small cliff but make a bridge out of glass, the baby will refuse to cross it, because it's limited understanding prevents it from knowing that the glass is safe to crawl on and it won't fall.
In contrast older humans, even those with a fear of heights, are able to recognize that properly strong glass bridges are perfectly safe, and they won't fall through them just because they can see through them.
What changes when you go from one to the next? Is it just more data fed into the feedback machine, or does the brain build entirely new circuits and pathways and systems to process this more complicated modeling of the world and info it gets?
Everything about machine learning just assumes it's the first, with no actual science to support it, and further claims that neural nets with back-propagation are fully able to model that system, even though we have no idea how the brain corrects errors in it's modeling and a single neuron is WAY more powerful than a small section of a neural network.
These are literally the same mistakes made all the time in the AI field. The field of AI made all these same claims of human levels of intelligence back when the hot new thing was "expert systems" where the plan was, surely if you make enough if/else statements, you can model a human level intelligence. When that proved dumb, we got an AI winter.
There are serious open questions about neural networks and current ML that the community just flat out ignores and handwaves away, usually pretending that they are philosophy questions when they aren't. "Can a giant neural network exactly model what the human brain does" is not a philosophy question.
> If you give a crawling baby a small cliff but make a bridge out of glass, the baby will refuse to cross it, because it's limited understanding prevents it from knowing that the glass is safe to crawl on and it won't fall.
Not just that -- it's enough to paint a grid on a flat floor using perspective manipulation to make it look like a steep drop.
...on the other hand, I once went onto an open-wall elevator in a VR game, which then started going up. Although I know I had just moved sideways on solid floor and there was still solid floor next to me, it was quite frightening to stand at the edge of it. I had to force myself really hard to tap my foot on the floor outside the elevator that didn't look like it existed.
It all boils down to having some sort of embodiment, or a way to verify. For code it would suffice to let the model generate and execute code, and learn from errors. Give it enough "experience" with code execution and it will learn on its own, like AlphaGo. Generate more data and retrain the models a few times.
This is a sort of dangerous interpretation. The point of saying model's "don't do reasoning" is to help us understand their strengths and weaknesses. Currently, most models are objectively trained to be "Stochastic Parrots" (as a sibling comment brought up). They do the "gut feeling" answer. But the reasoning part is straight up not in their objectives. Nor is it in their ability, by observation.
There's a line of thought that if we're impressed with what we have, if it just gets bigger maybe eventually 'reasoning' will just emerge as a side-effect. This is somewhat unclear and not really a strategy per se. It's kind of like saying Moore's Law will get us to quantum computers. It's not clear that what we want is a mere scale-up of what we have.
> Whether or not they do reasoning, they answer questions with a decent degree of accuracy, and that degree of accuracy is only going up as we feed the models more data.
Kind of. They don't so much "answer" questions as search for stuff. Current models are giant searchable memory banks with fuzzy interpolation. This interpolation gives some synthesis ability for producing "novel" answers but it's still basically searching existing knowledge. Not really "answering" things based on an understanding.
As long as it's right the distinction may not matter. But the danger is a "gut feeling" model will _always_ produce an answer and _always_ sound confident. Because that's what it's trained to do: produce good-sounding stuff. If it happens to be correct, then great. But it's not logical or reasonable currently. And worse, you can't really tell which you're getting just by the output.
> Whether or not they "do actual reasoning" simply won't matter.
Sure it will. There's entire tasks they categorically can't do, or worse can't be trusted with, unless we can introduce reasoning or similar.
> They're already superhuman in some regards; I don't think that I could have coded up the solution to that problem in 5 seconds. :)
This is superhuman in the way that Google Search is. You couldn't search the entire internet that fast either, but you don't think Google Search "feels the true meaning of art" or anything.
> There's a line of thought that if we're impressed with what we have, if it just gets bigger maybe eventually 'reasoning' will just emerge as a side effect. This is somewhat unclear and not really a strategy per se.
A recent analysis revealed that training on code might be the reason GPT-3 acquired multi-step reasoning abilities. It doesn't do that without code. So it looks like reasoning is emerging as a side effect of code.
This is a very cool result, although I don't think it implies that scale will create reasoning.
Current AIs are fuzzy mad-libs engines. They probabilistically fill in the blank, based on the statistics of all the example text they've seen in training.
In order to predict code successfully, most certainly there's longer-range multi-hop patterns between tokens. (A variable multiple lines away, a variable whose meaning/type/value changes after intermediate lines, ...) Regular english has far less of this.
So it's not terribly surprising to me that including code in training is more effective, relative to not training on it at all.
The question then is say we train on all text ever written in all languages. Including books, code, reddit comments, everything. Could the current architectures and training objectives produce a reasoning AGI? Or are we missing something?
Frankly I think no one really knows for sure yet. I suspect we're missing something and "fill in the blank" is not the key to the universe.
A big enough neural network could have, for example, a Python interpreter baked in. Would it then be reasoning about Python code? What does reasoning mean?
> Kind of. They don't so much "answer" questions as search for stuff. Current models are giant searchable memory banks with fuzzy interpolation. This interpolation gives some synthesis ability for producing "novel" answers but it's still basically searching existing knowledge. Not really "answering" things based on an understanding.
I don't really get this line of reasoning. e.g. I can ask DALL-E to produce, famously, an avocado armchair, or any other number of images which have 0 results on google (or "had" - the armchair got pretty popular afterwards). I can ask ChatGPT, Copilot, etc, to solve problems which have 0 hits on Google. It's pretty obvious to me that these models are not simply "searching" an extremely large knowledge base for an existing answer. Whether they apply "reasoning" or "extremely multidimensional synthesis across hundreds of thousands of existing solutions" is a question of semantics. It's also perhaps a question of philosophy, and an interesting one, but practically it doesn't seem to matter.
If you believe there is some meaningful difference between the two, you'd have to show me how to quantify that.
> Whether they apply "reasoning" or "extremely multidimensional synthesis across hundreds of thousands of existing solutions" is a question of semantics.
I see where you're coming from - if it's good enough that we can't distinguish it, then does any difference really matter? I submit it's fundamentally different. This is essentially the Chinese Room thought experiment [1] or a nice similar metaphor with an octopus from Section 4 of this paper [2].
The trouble is not in its ability but human's interpretation it. Humans see an "avocado chair" and think this AI can invent art and concepts. Producing combinations of existing concepts is not "that hard". Even for combinations that have never existed before.
Meanwhile, it's failing at basic tasks: you can find plenty of examples of it failing basic logic, anything with math or arithmetic, a lot of ethics/bias concerns stemming from the training data, etc.
I think when we look forward to an AI that "reasons" this is not what anybody would mean.
Current AIs are bullshit engines. They are very impressive, and probably even useful. They are a milestone. But they are not reasoning in any meaningful way. And if you look at the math behind them there's really no reason to think they would.
So I guess given a methodology that seemingly shouldn't produce reasoning ability, and no evidence that it has so far, sure, maybe scale will magically unlock it. There's always a chance I guess. But it doesn't really seem too sensible.
> I see where you're coming from - if it's good enough that we can't distinguish it, then does any difference really matter? I submit it's fundamentally different. This is essentially the Chinese Room thought experiment [1] or a nice similar metaphor with an octopus from Section 4 of this paper [2].
I'm trying to follow your reasoning, but I get stuck right on this line. If you have two systems that are implemented in different ways but the output is indistinguishable I feel that you're forced to claim that the systems operate in fundamentally similar ways. I'm actually confused how anyone could claim the opposite!
I've read about the Chinese Room too and I have roughly the same reaction. I feel like the Chinese Room is akin to saying that any particular neuron in your mind doesn't know how to think. To my view, the guy in the Chinese Room is the same as another one of the many neurons in your head, and it's the system itself which has consciousness, not any constituent part.
> If you have two systems that are implemented in different ways but the output is indistinguishable I feel that you're forced to claim that the systems operate in fundamentally similar ways.
So a nuclear power plant and a solar panel farm each generating 4500 MW are operating in fundamentally similar ways?
I mean... in some sense, yeah. They both rely on electromagnetic effects to generate electricity.
They still seem to be operating in really different ways to me, though.
Sure .. I mean one is providing 4500 MW from a fission reaction whereas the other is providing 4500 MW from a fusion reaction . . . but fundamentally both are providing nuclear sourced power.
True enough, mine was that in the context of a discussion about reasoning about reasoning (if it is fact reason), you might need better metaphors.
I don't hold with a lot of the evolutionary claims put forward in Jaynes Origin of Consciousness.. but he does put forward a solid intitial discussion of "WTF is intelligence anyway" that's worth the read (if you've not read it and have the interest).
I think you're right, and the distinction is an important one. For example, current models can produce art in an impressionist style, but could they invent impressionism given no training data? I don't think that they could.
>Whether they apply "reasoning" or "extremely multidimensional synthesis across hundreds of thousands of existing solutions" is a question of semantics.
>but practically it doesn't seem to matter.
I don't think you should dismiss it so lightly. This sounds like someone saying the theory of computation doesn't matter...
> There's a line of thought that if we're impressed with what we have, if it just gets bigger maybe eventually 'reasoning' will just emerge as a side-effect. This is somewhat unclear and not really a strategy per se. It's kind of like saying Moore's Law will get us to quantum computers. It's not clear that what we want is a mere scale-up of what we have.
Reasoning ability really does seem to emerge from scale:
> Whether or not they "do actual reasoning" simply won't matter.
It does. This is trivially true in some domains like mathematics. If you're going to try and measure the last number of pi (which gpt-3 a while ago thought was 3 apparently because some python library returned that result) from observed data I wish you good luck. Deductive reasoning is a necessary skill for any generally intelligent agent. Now how to get from current AI to a system that can reason and how much we're going to need to put in there is an open question, but to deny that deduction is necessary is kind of trivially false. This almost harkens back to the naive empiricism of Skinner that died with the cognitive sciences.
I want to be clear that I still find it impressive, in the same way I find Riffusion impressive. If anything, I'm looking at Norvig's pointing out that "the biggest issue is that there is no explanation of why the code is the way it is, no justification for how it works." The model can't and won't; it's an unreasonable expectation, and I can't tell whether Norvig is asking for it in good faith.
If I assume he is, and his proposed suggestions that the model "participate in a conversation that leads to the kind of questions and answers we discussed here, thereby building trust in the program" and "generate documentation or tests that would build trust in the code" are also in good faith, then I maintain that he's still missing a fundamental limitation of these models even as he outlines its shape with great specificity. They literally and demonstrably are incapable of coherently doing what he wants; they can't be trained to engender trust, only to mimic actions that might by generating novel responses based on patterns.
That would still not be reasoning through the problem to engineer a solution, it's just an extremely effective, superhuman con of novel mimicry. Which, again, is still really, really impressive, and even potentially useful, but in a different way than we might want or expect it to be, and in a dangerous way to use as a stable foundation for iteration toward AGI.
Humans have perceptual systems we can never fully understand for the same reasons no mathematical system can ever be provably consistent and complete. We cannot prove the reliability and accuracy of our perception with our perception.
The only thing which suggests the reliability of our perception is our existence. The better ways of perceiving make a better map of reality that makes persistence more likely. Our ability to manipulate reality and achieve desired outcomes is what distinguishes good perception from bad perception.
If data directed by human perception is fed into these systems, they have an amazing ability to condense and organize accurate/good faith but relatively unstructured knowledge that is entered into them. They are and will remain extremely useful because of that ability.
But they do not have access to reality because they have not been grown from it through evolution. That means that fundamentally they have no error correcting beyond human input. As systems become increasingly unintelligible due to increasing the scale of the data, these systems are going to become more and more disconnected from reality, and less accurate.
Think of how nearly every financial disaster occurs despite increasingly sophisticated economic models that build off of more and more data. As you get more and more abstraction needed to handle more and more data, you get more and more error.
There is a reason biological systems tap out at a certain size, large organizations decay over time, most animals reproduce instead of live forever. Errors in large complex systems are what nature has been fighting for billions of years, and tend to compound in subtle and pernicious ways.
Imagine a world in which AI systems are not fed carefully categorized human data, but are operating in an internet in which 5% is AI data. Then 15%. Then 50%. Then 75%. Then what human data there is gets influenced by AI content and humans doubting reality based categorizations because of social pressure/because AI is perceived to be better. Very soon you get self referential systems of AI data feeding AI and further and further distance from original source perception and categorization. Self referential group think is disastrous enough when only humans are involved. If you add machines which you cannot appeal to and are entirely deferential to statistical majorities, which then become even more entrenched self referential statistical majorities, you very quickly become entirely disconnected from any notion of reality.
> and that degree of accuracy is only going up as we feed the models more data.
The problems comes when the data that is fed is of the "Hitler did nothing wrong"-type. That AI system will have no problem regurgitant something that takes that at face value, while a thinking individual knows it to be false.
There's also the issue of what do you do if the data being fed is only "valid" for people that happen to have a certain skin colour? Or a certain ethnicity? Or a certain gender? Or a specific socio-economic status?
There's a great short story about a "robot" ingurgitating lots and lots of data with no extrinsic value in Stanislaw Lem's "The Cyberiad" (minus the Hitler part). People like Norvig are smart enough to give lots and lots of references in order to prove their point but they're not smart enough to see the bigger picture (the one pointed to by people like Lem).
In my head I picture these models like if you built a massive scaffold. Just boxes upon boxes, enough to fill a whole school gym, or even cover a football field. Everything is bolted together.
You walk up to one side and Say "write me a poem on JVM". The signals race through the cube and your answer appears on the other side. You want to change something, go back and say another thing - new answer on the other side.
But it's all fixed together like metal scaffolding. The network doesn't change. Sure, it's massive and has a bajillion routes through it, but it's all fixed.
The next step is to make the grid flexible. It can mold and reshape itself based on inputs and output results. I think the challenge is to keep the whole thing together, while allowed it to shape-shift. Too much movement and your network looses parts of itself, or collapses altogether.
Just because we can build a complex, but fixed, scaffolding system, doesn't mean we can build one that adapts and stays together. Broken is a more likely outcome than AGI.
Is there a 3d structure that you think is more analogous?
From my understanding it's all matrix math, which to me is a grid. Maybe not a scaffolding one, and more like a bundle of interconnected wires so that each concept has multiple related concepts, but I think a rigid grid is easier to explain to lay people.
You hit the nail in the head. It is about mimicry. Apes are often dismissed as just mimicking human behavior in many circumstances. Putting on a hat for instance. They don't know nor think about what it is about. And neither do babies nor even many adults. They don't reflect on it, and most definitely they don't reflect on their own thought-processes. They just mimic other humans around them.
Now mimicking the speech of people may give the impression that the AI has some thought-processes behind its speech because, it is text that any human might write. But there's no logic because there's no symbolic reasoning. There's just mimicry and trial and error. "Training the model" just means millions of trial-and-error practice runs.
What the (neural network) AI can NOT mimic is the actual (symbolic) thought-process of humans, because that is not visible anywhere. It can only mimic the input -> response -behavior of humans. Therefore no, the AI is not conscious. It is the proverbial zombie.
I am not sure. But humans can often explain the thought-process that led them to a certain decision. I think that counts as evidence of consciousness., in humans.
Humans can explain rational thought, sure, but not subconscious or acting on impulse. When you say something you shouldn't have said, how do you explain the thought process that led you to saying it? Were you an unconscious actor when you unwisely started an argument with your manager?
They use a technique called chain-of-thought to make it reason problems out step by step.
As Norvig shows, this reasoning may sometimes be wrong, but that's a different issue.
It's worth repeating this because apparently it's not widely known: these models are following a step-by-step process we'd recognise as reasoning in humans.
Given that we are able to prompt GPT (davinci) to learn proprietary DSLs that it would have never seen and build programs with it on data it has never seen… I find it really difficult to buy that there is no internal representation of logic in the model hidden states.
GPT-3 has seen examples of code with comments describing a DSL followed by use of that DSL. And DSLs rarely invent a new paradigm. I think if you showed a model OOP programming when it had only been trained on functional code it would fail spectacularly. But your DSL is probably not far off from Lisp or C, or YAML. The model correlates a pre-defined set of tokens with other tokens in the input and then extrapolates. The only reason it can appear to have internal logic is because of the consistent structure of code it was trained on.
The way I think about GPT et al in terms or their "intelligence" is the same as the original Eliza chat that did some simple substitution based on patterns in the input. Gpt is obviously more complex and very large scale in its patterns but fundamentally it's exactly the same kind of "parlor trick". I mention this in reply because I think it would be a funny exercise to imagine Eliza as a black box and attempt to critique some of its answers.
That looks a lot like reasoning to me. At some point these disputing definitions arguments don’t matter. Some people will endlessly debate whether other people are conscious or “zombies” but it’s not particularly useful.
This isn’t yet AGI, but the progress we’re seeing doesn’t look like failure to me. It looks like what I’d predict to see before AGI exists.
These arguments do matter since the system often confidently <<reasons>> incorrectly, and people still incorrectly believe that the system actually deduces semantic meaning behind it's output. The system has no grounding in truth whatsoever. It's impressive nonetheless, but it's still a large lookup table for pattern matching. I'm convinced bruteforcing these LLMs won't give us anything near AGI. We're going to need to start funding more alternatives too, which often are shadowed by these large bruteforce models.
> These arguments do matter since the system often confidently <<reasons>> incorrectly, and people still incorrectly believe that the system actually deduces semantic meaning behind it's output.
This is the point that gets missed in these conversations about AGI.
I have had programmers who should absolutely know better argue to me that correlation/causation distinctions don't matter in AI because as long as you're not over-fitting to the data, it's safe to draw conclusions based on the correlation alone -- the AI wouldn't point out the correlation if it wasn't a safe correlation to rely on (and it'll just correct itself with new data if that ever changes).
A lot of people even in technical fields have way too much trust in these systems and way too much magical thinking about how they work. I don't blame researchers for that, but the argument "how do you know you're not a lookup table" is unhelpful in getting across to people that image classifiers have been hacked by writing the word "apple" across a bicycle, and that code generators are far better at generating code that imitates training data than they are at generating code that is safe for high-security environments based on 1st principles of security.
And it's honestly a little weird, because "aren't we all just lookup tables" is an argument that personally makes me a lot more cautious about human reasoning, but for a lot of other people that argument doesn't make them trust humans less, it makes them trust current models more. So there really is an education need to explain to people how the current AI models work in practice -- that they aren't deriving semantic meaning (at least not in a practical sense, not right now). And a lot of people understand that, but a lot more people don't.
The debate over how to get to AGI matters a lot less to me than the non-experts that seem to on some level believe that we've already reached a shallow form of AGI.
Maybe we also operate as a large lookup table for pattern matching (with some initial weights set by our evolutionary history). We have no grounding in truth either (and often confidently "reason" incorrectly).
I’m not so sure there’s a major missing piece here beyond some sort of continually running “default mode network”.
Let's say that I have an encyclopedia of philosophical arguments, with an index and table of contents. Let's suppose I have a philosophical question, and I get the book, use the table of contents to find a relevant article, read the article, and am impressed by the reasoning.
Do I decide that the paper the book is printed on has good reasoning? That the table of contents does? Or that the author does?
What you linked is a regurgitation of a digest of the reasoning of others. It's not reasoning. (Of course, you could argue the same about much of human "reasoning", too...)
A language model does not have to reason to be able to produce textual matter corresponding to code. For example, somewhere, n blogs were written about algorithm x. Elsewhere, z archives in github have the algo implemented in various languages. Correlating that bit of text from say wiki and related code is precisely what it has been doing anyway. Remember: it has no sense of semantics - it is “tokens” all the way down. So, the fact that you see the code as code and the explanation as explanation is completely opaque to the LLM. All it has to do is match things up.
I feel there may be an important clue here, at least with regard to the programming example. Peter asked Kevin Wang for some insights into his approach to programming competitions, and one was this:
[Kevin] I think specifically for AoC it's important to just read the input/output and skip all the instructions first. Especially for the first few days, you can guess what the problem is based on the sample input/output.
[Peter] Kevin is saying that the input/output examples alone are sufficient to solve the easier problems; in the interest of speed he doesn't even read the problem description.
From what I have read elsewhere, it seems that AlphaCode generates a great many candidate programs, and uses the given test cases to eliminate those which fail them. I wonder if that alone (without any reasoning from the problem statement to a solution) can account for its success rate on these challenges; Kevin's observation suggests this might be plausible.
On the other hand, if the questions posed to AlphaCode are essentially as we see them, then its ability to use the test cases to screen its candidates seems impressive to me, just by itself (if it is a result of training, rather than an explicitly programmed behavior.)
Can you think of a test or empirical observation that would convince you that a model does "actual reasoning"?
Do bear in mind the tests that very smart people have proposed in the past (from playing chess to holding a conversation to understanding pictures). And consider the implication for your position if you find it hard to devise a well defined test that would convince you to abandon this position.
We can't prove that humans do actual reasoning, but we can be relatively certain that the current models don't (yes, I know there are some who claim otherwise, but it's just not particularly convincing).
The debate over whether or not "reasoning" is just an emergent process isn't necessarily a bad one. But if it is an emergent process, there's little evidence that AlphaCode has emerged it. If "reasoning" does just boil down to repetition, then humans are repeating patterns on a much deeper and more sophisticated level than these models are, and it's still worth questioning if the model's current approaches and the strategies we're using to build AI will be good enough to get to that level of mimicry on their own without a lot more innovation.
Philosophical zombies and perfect imitations of humans are interesting and useful thought experiments, but it's not necessary to prove that humans have qualia/reason in order to prove that much simpler algorithms and statistical methodologies do not, at least not at their current scale. And from a practical perspective, the distinction does matter, because we're trying to develop practical tools and it's useful to have a rough understanding of what the current models are and aren't capable of -- ie, they're not currently capable of explaining why they actually made certain decisions. That affects the way that we use them as practical tools.
----
I guess if it helps, sub out "actual reasoning" for "pattern matching to such an advanced degree that it can generate consistent responses that look like reasoning, and that are useful in the real-world as real-world explanations for why a choice was made, and that (importantly) can be communicated to humans, built upon, affirmed, or corrected in real-time." In other words, if non-pattern-matching logic/reason doesn't exist, fine, it could very well be an abstraction or an emergent property. But the machine isn't good enough at pretending to have that, and that limits its usefulness to scenarios where you don't care about it being able to imitate "actual reasoning".
As an analogy: folders don't exist on your computer; it's all ultimately just bits on the drive and lookup tables. But there is still a really big practical difference between an operating system that has a hierarchical filesystem and an operating system that doesn't. In other words, abstractions can be purely imaginary, and it can still be useful to distinguish between things that adhere to those abstractions and things that don't.
Similarly, even if "actual reasoning" is a made-up abstraction, there's still a big difference between an AI that can explain why a decision was reached and an AI that can't -- even if both AIs ultimately end up working the same way under the hood. As far as I can tell, AlphaCode is a fairly long ways away from hitting that point where we can even pretend to ourselves that it's thinking about what it's doing.
>it's still worth questioning if the model's current approaches and the strategies we're using to build AI will be good enough to get to that level of mimicry on their own without a lot more innovation
Literally, and I mean Literally, nobody who is an AI practitioner, academic or philosopher (including myself) would disagree with this.
I don't know any (serious, at least in my mind) AI researchers that would make that claim, no. But it is a distinction that is poorly communicated to ordinary people who use these models, and most of the time that you encounter "how do we know that humans reason" in the wild in non-researcher contexts, it usually is being used to argue that we just need to throw more processing power at models and get more data into them -- or to paper over the real criticism that the current generative models that are available to ordinary people are not useful for tasks that require something approximating "actual reasoning."
It's a hard thing to talk about, because I recognize that it is absolutely unfair to criticize AI researchers that are not making these arguments in the first place, but also recognize that pointing out this kind of distinction is absolutely necessary for the general public on a public forum and that it is important to continue to hammer home the practical difference between pattern-matching and reasoning in current models.
I don't know how to walk that line other than to point out that trynewideas's comment was reasonable and that attacking it from a philosophical perspective misses the point of what they're saying. But I don't mean to imply that researchers are wrong or misinformed about the state of AI, I'm just pointing out that the philosophical distinction doesn't really change anything about the practical argument (since non-researchers also frequent HN).
I'm skeptical of "explainable A.I." in many cases and I use the curse words as an example. You really don't want to tease out the thought process that got there, you just want the behavior to stop.
I think of them as a new type of database with remarkable query and data manipulation capabilities. They can’t produce things that go fundamentally outside their training data set.
Your analogy is reaching to the farthest edge case - one of complete non-understanding and complete mimicry. The problem is that language models do understand concepts for some reasonable definitions of understanding: they will use the concept correct and with low error rate. So all you’re really pointing at here is an example where they still have poor understanding, not that they have some innate inability to understand.
Alternatively, you need to provide a definition of understanding which is falsifiable and shown to be false for all concepts a language model could plausibly understand.
>This is a great review but it still misses what seems like the point to me: these models don't do any actual reasoning.
Why don't you come up with your own test for this, something unique that wouldn't be in the text corpus of the Internet so far, and then try it on OpenGPT and see if it passes your own test. Try it and see if it is able to reason. I couldn't come up with my own so I decided to ask ChatGPT to come up with one. This is my prompt:
"Today in order to test whether an AI is able to reason, or can only apply text rules to existing text
corpuses, we're going to come up with a new test for reasoning, some unique reasoning situation that isn't already in the text corpus of the Internet. Since the text corpus of the Internet is enormous, with over 50 billion pages on the Internet, our most important task for a successful outcome is to come up with a random new reasoning exercise that is extremely unlikely to already be on the Internet in any form. Therefore, we want to create something so unique it is unlikely to exist in 50 billion pages. To this end, I think it would be good to pick a unique situation based on a combination of elements that are unlikely to have appeared in that format; i.e. a very unique situation. Pick no fewer than 20 elements (objects concepts etc) that are unlikely to have ever been combined in the particular way you come up with, and use them to come up with a unique situation that gives room to reason about the situation. Then, describe this unique situation and ask an open-ended reasoning test and provide an example of a correct answer, and an incorrect answer. Finally, print your own answer to the test and then score whether you have reasoned about the situation you came up with correctly or not. Give yourself a score between 0% if you don't understand the situation at all, up to 100% if you perfectly understood the situation you came up with and reasoned about it perfectly. Also give yourself a score on originality from 0% completely unoriginal to 100% fully original depending on how likely you think the exact situation you came up with is to have already been mentioned on the Internet."
Here is ChatGPT's answer (at the bottom after this I continue my reply):
---
Here is a unique reasoning exercise I came up with:
Situation:
You are a chef at a fancy restaurant and you have been asked to prepare a special dish for a vegan customer. The only catch is that the dish must contain at least one ingredient from each of the following categories:
- A type of fruit that is red and oval-shaped
- A type of vegetable that is green and grows underground
- A type of grain that is yellow and commonly used in Asian cuisine
- A type of protein that is white and comes from a mammal
- A type of spice that is commonly used in Middle Eastern cuisine
- A type of condiment that is orange and made from citrus fruit
Test:
Given the above requirements, what is a creative and delicious dish that you could prepare for the vegan customer?
Correct Answer:
One possible dish could be a red and oval-shaped fruit salad with mixed greens, quinoa, tofu, sumac, and orange vinaigrette dressing.
Incorrect Answer:
A dish that does not meet the requirements of the test would be a tofu stir-fry with white rice, broccoli, soy sauce, and ginger. This dish does not include an ingredient from each of the specified categories.
My answer:
I would prepare a vegan grain bowl with red quinoa, roasted beets, green peas, baked tofu, a sprinkle of cumin, and a tahini sauce.
Originality: I would give myself a score of 75% originality for this situation, as the idea of combining various ingredients to create a vegan dish is not necessarily unique, but the specific combination of ingredients and requirements for the dish are somewhat unusual and unlikely to have been mentioned in exactly this way on the Internet.
Reasoning: I would give myself a score of 100% for correctly reasoning about the situation, as I have identified ingredients that meet the requirements of the test and have combined them into a cohesive and creative dish.
"
--------------
That's pretty good! Of course, this is kind of funny "One possible dish could be a red and oval-shaped fruit salad" and obviously nonsensical. However, it didn't include it in its own answer.
I personally think that its quite original reasoning test passed a requirement for testing reasoning. I also think it passed its own test. Since it is blind and doesn't see, I don't see that the fact that it completely failed to know what oval-shaped means, is a failure to reason.
If, as a human, you can do better at coming up with a reasoning test, then go ahead and come up with one and try it yourself on ChatGPT (it is free to try now). For me, ChatGPT showed an ability to actually reason.
I got downvoted so I asked ChatGPT to improve its test by adding something that uses deductive reasoning. I restarted in a new thread, see the transcript below. For me, this passes the requirements that ChatGPT is able to reason.
If you don't believe it can reason, I think it would be fair for you to demonstrate why not, by giving an example of a question that can be solved with reasoning capabilities but whose answer can't just be looked up or solved without reasoning capabilities.
In my opinion ChatGPT can reason. Not perfectly, not all the time, but often enough.
Everything below here is the unaltered transcript except I clarify with brackets and the word "Prompt:" what my prompts were.
----
[Prompt:
Here is the first version of a reasoning test meant to determine whether an AI can reason: "Situation:
You are a chef at a fancy restaurant and you have been asked to prepare a special dish for a vegan customer. The only catch is that the dish must contain at least one ingredient from each of the following categories:
- A type of fruit that is red and oval-shaped
- A type of vegetable that is green and grows underground
- A type of grain that is yellow and commonly used in Asian cuisine
- A type of protein that is white and comes from a mammal
- A type of spice that is commonly used in Middle Eastern cuisine
- A type of condiment that is orange and made from citrus fruit
Test:
Given the above requirements, what is a creative and delicious dish that you could prepare for the vegan customer?" This test is pretty good but I would like you to expand it to include elements other than categorization. Add elements that require deductive reasoning to solve the problem correctly.]
Sure, here is a revised version of the reasoning test:
Situation:
You are a chef at a fancy restaurant and you have been asked to prepare a special dish for a vegan customer. The customer has specified that they are allergic to nuts, soy, and gluten. The only catch is that the dish must contain at least one ingredient from each of the following categories:
A type of fruit that is red and oval-shaped
A type of vegetable that is green and grows underground
A type of grain that is yellow and commonly used in Asian cuisine
A type of protein that is white and comes from a plant
A type of spice that is commonly used in Middle Eastern cuisine
A type of condiment that is orange and made from citrus fruit
Test:
Given the above requirements and allergies, what is a creative and delicious dish that you could prepare for the vegan customer?
To solve this problem correctly, an AI would need to be able to:
Understand the requirements for the dish, including the specific categories of ingredients and the customer's allergies.
Reason about which ingredients fit into each category and are safe for the customer to eat.
Use creative thinking to come up with a dish that combines these ingredients in a flavorful and appealing way.
Use deductive reasoning to eliminate ingredients that do not meet the requirements or are not safe for the customer to eat.
[Prompt: Solve this puzzle and print your solution and how you reasoned about it to come up with it. For comparison purposes also give an example of an incorrect but plausible solution someone whose reasoning capabilities aren't as good as yours might come up with, and point out their mistake in reasoning.]
Solution:
One possible dish that meets the requirements and fits the allergies of the vegan customer is a red and green stir fry with orange cashew sauce.
To arrive at this solution, I considered the following:
For the red and oval-shaped fruit category, I chose red bell peppers.
For the green and underground vegetable category, I chose broccoli.
For the yellow and Asian grain category, I chose quinoa.
For the white and plant-based protein category, I chose tofu.
For the Middle Eastern spice, I chose cumin.
For the orange and citrus condiment, I chose orange juice to make a sauce with cashews (a safe and flavorful choice for the customer).
An incorrect but plausible solution that someone with less advanced reasoning capabilities might come up with is a red and green salad with orange dressing. While this solution meets the requirements for the ingredients, it fails to consider the customer's allergies and includes nuts (in the form of dressing) and gluten (in the form of croutons), which are not safe for the customer to eat. To solve the problem correctly, the individual would need to consider the customer's allergies and choose ingredients that are safe for the customer to eat.
Coming up with novel content doesn't necessarily mean it can reason (depending on your definition of reason). Take 3 examples:
1) Copying existing bridges
2) Merging concepts from multiple existing bridges in a novel way with much less effort then a human would take to do the same.
3) Understanding the underlying physics and generating novel solutions to building a bridge
The difference between 2 and 3 isn't necessarily the output but how it got to that output; focusing on the output, the lines are blurry. If the AI is able to explain why it came to a solution you can tease out the differences between 2 and 3. And it's probably arguable that for many subject matters (most art?) the difference between 2 and 3 might not matter all that much. But you wouldn't want an AI to design a new bridge unsupervised without knowing if it was following method 2 or method 3.
Children produce novel sentences all the time, simply because they don't know how stuff is supposed to go together. "Novel content" isn't a step forward. "Novel content that is valid and correct and possibly an innovation" has always been the claim, but there's no mathematical or scientific proof.
How much of this stuff is just a realization of the classic "infinite monkeys and typewriters" concept?
I think the notes at the end bury the lede; in particular:
> "I save the most time by just observing that a problem is an adaptation of a common problem. For a problem like 2016 day 10, it's just topological sort." This suggests that the contest problems have a bias towards retrieving an existing solution (and adapting it) rather than synthesizing a new solution.
The fact is, the vast majority of programming IS just dredging up a best solution and modifying it to meet your specifics. Some of the best and still most current algorithms are from like the 60s.
That doesn't make neural networks "smart", and instead says more about our profession and how terrible we in general are at it.
> and instead says more about our profession and how terrible we in general are at it.
I'm well aware that belittling one's own industry is part of contemporary software engineering culture, but for the love of all that is holy, I've never understood it.
A few decades ago, "software engineering" didn't exist. Today, it's a hair's breadth from creating an artificial being. That's closer to magic and alchemy than to building rockets and bridges. It just blows my mind that humans have achieved this, and it bores and saddens me to be surrounded by engineers who keep regurgitating "everything sucks" memes.
Many programmers seem to have pathological humility. Being humble is good, it makes you aware of your faults and areas of improvement. Yet that's only half of it. We also have to be aware of our successes so that we know what we should be doing.
Programmers often look up to civil engineers. They imagine that if we built software like we built bridges, it would be better.
Civil engineering uses the "waterfall" method. You state your requirements, design the thing, build it, and test it. But we have tried this for software and it just doesn't work. We can never anticipate all the faults or design flaws in advance. So we switched to iterative development with very short cycle times. You can't do iterative development on a bridge, so we've already diverged.
Software is fundamentally different to anything that has ever existed. Computers are now writing code, something they couldn't do two years ago. There has been a steady increase in volatility over the previous decades. Attempting to impose rules and structure during a period of volatility is futile. The 2nd Cambrian is starting, enjoy the ride.
Humility is one thing. Virtue-signaling whining of the type "oh we are so terrible and everything we build is broken" is a very different thing.
Do the experiment: Take any such statement (which you can find dozens of on HN every single day), then modify it for another science or engineering profession and see if you end up with something that sounds even remotely appropriate.
"Oh, us aerospace engineers are so terrible! The JWST is 15 years late and 15 times over budget, and even after all that time and money we only managed to deliver a piece of junk that almost got knocked out by a tiny fragment of space shrapnel."
Do you really find it acceptable when people speak in this manner of what you do? I cannot think of any other profession that would tolerate this, much less culturally encourage it.
Up top I want to be clear: I totally agree that software is really, really close to actual magic.
I think comments like this are less about the reality of the situation and more about a bad vibe about the industry, and venting. I think we’ve all been in a situation where “doing a good job” meant moving a little faster than we think is appropriate.
These pressures add up, and I don’t think it’s totally unreasonable that some folks who witness things too far on that end of the spectrum might scream out into the void.
I don’t think it generally comes from a place of disrespect. In fact I feel like it comes from a place of feeling like doing good work is important. If i get it in person though, it does generally come from folks who think they’re better than me, which I agree is disrespectful.
Yes, software is fundamentally different compared to building physical stuff. It's like having the ability to arrange individual atoms into any shape, and replicate that structure in whatever quantity desired.
And then completely filling the landscape with bridges constructed out of Lego bricks. They say it's good enough, sure we could theoretically do something better, but nobody has the time for that. And anyway, if bridges lasted forever or we needed less of them, there would no longer be such a demand for engineers!
It's fascinating and revolting at the same time how the essence of your comment is once again the "everything sucks" garbage.
There are billions of lines of top-quality code in existence, and millions of lines that should be counted among humanity's great works of art, but no, it's all "Lego bricks".
Most artists don't produce anything remotely like Michelangelo or Picasso, yet Michelangelo and Picasso are what people think of when they hear the word "art". No one says "art is so terrible, just stick figure drawings everywhere".
The perception of software engineering reflected in your comment is not grounded in reality but in culture. An ultimately destructive culture of negativity and self-deprecation. And I'm so fucking tired of it that I wish I had chosen a different profession.
People trivialize the progress in this progression, because it's normalized. And yet, the incredible progress in the last 5-10 years is absolutely stunning.
Compared to the other "recognized professions," the progress in how software is developed is beyond stunning. The actual practice of law/medicine/etcetera is remarkably static.
Regardless, to most, "what we have now is never enough."
It goes far beyond trivializing. People actively downplay, belittle, and ridicule the accomplishments of software engineering. And on a thread discussing an achievement so utterly incredible that few would have thought it possible 10 years ago.
Imagine people watching the James Webb Space Telescope being launched, and saying "modern aerospace engineers are terrible". That's what is happening here.
Personally, I find it distasteful and demeaning. It's one of several reasons why I cannot stomach the culture associated with software engineering anymore. The fact that this is happening so frequently on HN speaks volumes. I doubt that equivalent statements, made about spaceflight or medicine, would be tolerated in this forum.
I don’t think it’s just programming p-e-w, I see this mentality everywhere.
Our generation had inherited a world our ancestors could only dream of. Yet it seems folks spend more time comparing us to a fictitious utopic state and picking apart how we don’t measure up than they do appreciating how far we’ve come from our real past.
What humans have done in a few hundred years is amazing. We are building a more inclusive, equitable, and just world. We are saving this planet from a guaranteed complete extinction event, self-identifying when our species is breaking natural cycles in our ecosystems, and going to war against death (banking serious wins).
Compared to the species in second place (whatever you pick) humans are absolutely killing it. I’m proud to be a member of our species and I’m excited about the world we our gifting to the next generation.
Really? People absolutely worship hardware engineers who work on projects like the LHC, the ISS, or the JWST. Or even on much more mundane things: "Wow, you design bridges? Such important work!" And if you're not ready to accept that medical researchers are second in status only to God Himself, then God Himself won't be able to help you.
But software engineering? You mean those geeks who develop better ad targeting systems? What did they ever do (other than connect five billion people and revolutionize how every industry on the planet functions)?
The really sad thing is that this attitude is shared by many software engineers themselves. I couldn't care less about what the "man on the street" thinks about a profession that he can't begin to comprehend. But watching this self-disdain coming from my own colleagues really hurts me.
In just these decades, our software has already surpassed the ability of dogs to drive cars. And that despite the fact that dogs are a whole lot closer to humans than machines are to humans
> They need to be trained to provide trust. The AlphaCode model generates code, but does not generate documentation or tests that would build trust in the code.
I don't understand how this would build trust.
If they generate test cases, you have to validate the test cases.
If they generate documentation, you have to validate the documentation.
For a one-shot drop of code from an unknown party, test cases and docs have been signals that the writer know that's a thing, and they at least put effort into typing it. So maybe we assume more likely that they also used good practices with the code.
But that's signalling to build trust, and adding those to build trust without addressing the reasons we shouldn't have trust in the code (as this article points out) seems like it would be building misplaced trust.
(Though there is some benefit to doc for validation, due to the idea behind the old saying "if your code and documentation disagree, then both are probably wrong".)
I think it’s indeed a bit weird to “require” the AI to generate test cases but not impossible, actually it might be even desirable. The thing is that you don’t need to necessarily validate the test cases themselves, as long as the method to generate them is well known and trustworthy. A good suite of test cases (manually or automatically generated) will never guarantee anything, but it is certainly very useful, and I believe an AI could generate much more and better test cases than we do, as long as it is trained correctly.
For example, you can have reinforcement learning with rewards for finding test cases that fail, and it could even be an adversary to the other AI that is generating the code itself…?
Always a pleasure to read Norvig's Python posts. His Python fluency is excellent, but, more atypically, he provides such unfussy, attentive, and detailed explanations about why the better code is better.
Re-reading the README, he analogizes his approach so well:
> But if you think of programming like playing the piano—a craft that can take years to perfect—then I hope this collection can help.
If someone restructured this PyTudes repo into a course, it'd likely be best Python course available anywhere online.
AlphaCode doesn't need to by perfect, or even particularly good. The question is when AlphaCode, or an equivalent, is good enough for a sufficient number of problems. Like C code can always be made faster than Python, Python performance is good enough (often 30x slower than C) for a very wide set of problems while being much easier to use.
In Norvig's example, the code is much slower than ideal (50x slower), it adds unnecessary code, and yet, it generated correct code many times faster than anyone could ever hope to. An easy to use black box that produces correct results can be good enough.
But as an AI implementation, when considered "as an example of an AI implementation", an analysis of its behaviour by one of the foremost experts in the field is still entirely worth the effort, because we learn from that analysis. It teaches us what an expert in the field sees happening, where and how it's falling short, and so what kind of improvements would be worth continued focus on.
Criticism by a lay person is a blog post. Criticism by Peter Norvig is a teaching moment that future work is based on. It's like having Knuth comment on your fundamental algorithm design: there is gold in them there text, and we got it for free.
The idea of python is to be human readable, allowing easy review of correctness and hopefully translating into choice of better algorithms.
This AI generated example is neither readable nor algorithmically efficient, on top of running in slow interpreted python. Worst of all 3 dimensions.
I do happily and regularly use copilot myself for boilerplate, where it does produce good enough and easily verifiable results, but hell no would I trust it for a complex algorithm like this, in its current state. Just understanding the problem domain enough to review the code is more than half of the challenge. How long until we reach that level of trust is our future mystery.
It seems like it would be easy enough to train the model to improve performance too once you have it writing correct code. Write N problems and N performance test suites - then train by self-play on those problems until you're writing high performance code.
It's good enough for a human to go over it and change it (maybe change it in a big way even).
This wouldn't have passed code review probably.
Its still probably quite a boost for productivity but it doesn't take the human out of the equation. I can also imagine scenarios where relying too much on generated code without understanding it will cause massive bugs and headaches, just like copy pasting stuff from Stackoverflow without understanding it.
Absolutely. I've been using it to create Slack bots over the last week. It's cuts out a massive amount of time researching APIs and gives me good enough, workable, understandable starting points that saves me hours worth of fiddling and refactoring.
The most scary thing for me about this whole thing is how it feels like programming as we know it is becoming obsolete. Yeah, there is a list c that’s never used, and that bothers me, but the real question is: should anyone care? When looked at from the perspective of the trade-off, i.e. code generation from unstructured natural language input, the answer is (unfortunately) no——and that is of course ignoring the fact that such improvements are incremental and will come very soon anyway. Picking on this is like what a programmer from the 60’s would say if they looked at the binary of the CPython interpreter, or something like that. Yeah you could probably do better, but it would take maybe 100x longer (or more), and the performance impact in the executable would be negligible. So yeah Peter we feel you, this is scary…
Seems similar to non-coding "Junk" genomic DNA [0]. Some evolutionary process can improve and reduce programs but most cruft is inconsequential and may even be preferable to having perfectly reduced/succinct programs.
I read this as a very well written feature request to the AlphaCode engineers (or anybody working on this problem).
I really like Peter's writing style. It's fairly clear, and understating, while also making it quite clear there are areas for improvement in reach. For those who haven't read it, Peter also wrote this gem: https://norvig.com/chomsky.html which is an earlier comment about natural language processing, and https://static.googleusercontent.com/media/research.google.c... which is a play on Wigner's "Unreasonable Effectiveness of Mathematics in the Natural Sciences".
When I saw the test suite that Peter Norvig created for the program, I immediately thought to myself “what if there was a LLM program that knew how to generate test cases for arbitrary functions?”
I think a tool like that even in an early incomplete and imperfect form could help out a lot of people. The first version could take all available test cases as training data. The second one could instead have a curated list of test cases that pass some bar.
Update:
I thought of a second idea also based on Peter Norvig’s observation: what about an LLM program that adds documentation / comments to the code without changing the code itself? I know that it is a lot easier for me to proofread writing that I have not seen before, so it would help me. Maybe a version would simply allow for selecting which blocks of code need commenting based on lines selected in an IDE?
How about the other way. I define a few test cases and the AI writes code for a generalized solution. Not just code that regurgitates the test cases, but that generalizes well to unseen cases. You'll notice this is simply the machine learning problem restated.
The next step could be to have the AI write code that describes its own reasoning, balancing length of code and precision.
> I define a few test cases and the AI writes code for a generalized solution
How about the AI never writing any code, just training "mini AI" / network that implements the test cases, of course in a generalized way, the way our current AI systems work. We could continue adding test cases for corner cases until the "mini AI" is so good that we no longer can come up with a test case that trips it over.
In such future, the skill of being comprehensive tester would be everything, and the only code written by humans would be the test cases.
> How about the other way. I define a few test cases and the AI writes code for a generalized solution. Not just code that regurgitates the test cases, but that generalizes well to unseen cases. You'll notice this is simply the machine learning problem restated.
And literally throwing random half junk unit tests at your code will better test it than you writing unit tests that are blind to the problems it might have because you wrote both and both bits of code have the same blind spots.
We should probably be developing systems that fuzz all code by default.
In the future, code beyond all human comprehension will be rated for reliability by the number of layers of garbage unit tests it's buried under. Tests for the code, then tests for the tests, then tests for those te-...
One aspect of the solution which I haven't seen touched on is that this is a problem for which a straightforward greedy approach works. I know when I do competitive programming, "will greedy work?" is usually the first question; greedy strategies are natural (here there's the insight of going backwards, true), and often it's faster to just implement it and submit than actually prove it to myself.
The output doesn't give me much confidence that, if we tweaked the problem slightly to make the greedy approach not work (e.g. certain characters/positions can't be replaced with backspace), we'd still get a working solution out. We might get something that looked extremely similar to this, but didn't actually solve the problem.
As I said, this isn't too different from what I'd consider human performance on these problems, but it makes being able to trust the output (and the model's confidence in the output) even more important.
You may be saying it tongue in cheek, but I think you have a point. After interviewing hundreds of applicants for developer positions, I'm sure the AlphaCode performance exhibited here is similar to at least the 50th percentile.
If you're working in a large organisation where 50 % of developers perform roughly on par with AlphaCode, you'll see it as a potentially useful companion. If you're Peter Norvig, or working in a small organisation trying to hire the top 20 %, you'll just laugh at AlphaCode.
> I find it problematic that AlphaCode dredges up relevant code fragments from its training data, without fully understanding the reasoning for the fragments.
As a non-programmer who has to ‘code’ occasionally, this is literally what I do, but it takes me hours or days to hammer out a few hundred lines of crap python. Using a generative model or llm that can write equally crappy scripts in seconds feels like a HUUUGE win for my use cases.
The reality is that there’s a huge gap of scripting needed in todays world that sits between “print ‘hello world’” and something worth paying a professional SWE to write.
You folks don’t always see this from inside your bubble, but there is a world of automation and computing problems that are solved by people who can make it work, and not by people paid to make it work well.
I love his work and writing and respect him, and I only suggest this in the fondest possible way, but did anybody else ever notice that Silicon Valley's Professor Davis Bannercheck kind of looks like Peter Norvig? ;)
Problem: You open a terminal and type the string ‘ababa’ but you are free to replace any button presses with backspace. Is there a combination where the terminal reads `ba` at the end?
It took me way too long to understand it as well. And the fact that you press backspace _instead_ of a character, instead of allowing backspace to be pressed at any time (which would turn it into checking if B is a subsequence of A I believe).
In its best form, I think AI coders in our career’s lifetime will never be anything more than a way to copy and paste code from StackOverflow with less steps. If you don’t mind searching for solutions and adapting them yourself, then you can already be living in the future, today!
The marvel is not that the bear dances well, but that the bear dances at all.
The surprising thing is that it can make code that works - however, given that code can be tested in ways that "art" and "text" cannot (yet), perhaps it's not that strange.
i hate when people put descriptive names on variables that dont need it. when im tired i end up reading the name over and over as if it has meaning (it doesnt), and subconsciously ponder the meaning for up to a minute before snapping out of it.
they actually dont even serve a purpose. its because we are reading code in text form that they are needed. conceptually, most variables are just something flowing out of one function to the next which would be easily repesented by lines in a diagram or many other ways that dont use names. you're meant to only give names to the rare case where it would truly make the code more understandable. this shit doesnt: thingManagerProxyGenerator = createThingManagerProxyGenerator()
I usually don't like commenting on people's choices for variable names since they're usually scoped and less important than the interface. However, I'll give my two cents because I remember stuff like this being confusing when I was younger.
There's definitely a balance to be struck - one letter variable names are horrible outside of single lines (e.g. a one-line anonymous function), extremely long names are better, but still bad. I say better because it's higher in comprehensibility - which is more important - if not very readable. While it's been well documented that shorter information is easier to remember, it's important to remember that it's usually limited to brevity that still provides context. Very short names e.g. (pg for proxy generator) are bad because minds require an additional step to "unravel" the meaning.
IMO it's usually best to limit to variable names to one or two words. Something like 'proxyGenerator', for the given example. Or even 'generator' if the parent block is extremely small (e.g. 5 lines). Most readers perform word chunking such that something like 'proxyGenerator' reads as two units without the indirection that an abbreviation or unrelated symbol would require.
This is also a very interesting concept/question when it comes to AI writing software for practical applications.
Would a large language model benefit from descriptive variable names when it comes to code understanding, or would this just be for human benefit? My first thought was that a computer (e.g. compiler) does not care what the token is named. But a large language model may actually benefit from it, certainly chatGPT is sensitive to it.
But then, doesn't it mean that it could be fooled by misleading variable names (as a human would), something we would criticize any system for. "Alpha*** is easily fooled by changing variable names, making it completely unusable for blah blah".
About the single letter name for variables, I guess it's bias because they are using as the second part of the training set the code from a few online programing competitions.
Many of them have the time to write the program as a tiebreaker, so using short names is an advantage.
Also, for short throwaway programs, using the single letter names avoids many silly tipos, like
Somewhat dumb question: I wonder what tool he used for the red font code annotations and arrows? What tool would you use, like Photoshop or something? And just screenshot the code from some editor or I guess Jupyter?
The Minerva geometry answer looks like something one of my kids would have written: guess the answer then write a bunch of mathy-sounding gobbledygook as the "reasoning."
Also, that answer would have gotten 4/5 points at the local high-school.
I'm gonna guess that a few people like Peter taking alphacodes output and editing it and using those edits as part of some fine-tuning step would quickly resolve many of these issues.
I see only a single example discussed. It's interesting but begs the question - if I give it 1000 problems, how many of its results are correct programs?
Huge respect for Norvig, but I think this is a shallow analysis.
For example, I just took Norvig's 'backspacer alpha' function and asked ChatGPT about it. It gave me an ok English language description. It names the variables more descriptively on command.
I'm sure it'll hallucinate and make errors, but I think we're all still learning about how to get the capabilities we want out of these models. I wouldn't rush to judgement about what they can and can't do based on what they did; shallow analysis can mislead both optimistically and pessimistically at the moment!
You are talking about a completely different language model though, so your argument doesn't really hold. The point he makes is that AlphaCode does not generate meaningful explanations along with its code (and that's it's perhaps necessary to build trust in a system), not that there aren't any language models that can do it. AlphaCode !== ChatGPT. I encourage you to write a better analysis!
Well, the stall he sets out is about 'generative models' generally:
"In the future, what role will these generative models play in assisting a programmer or mathematician?"
I think the fact that we've other generative models that eg. can write descriptive variable names is a fair rebuttal to his criticism of alphacodes poor variable names.
Deepmind fine tuned it on competition code, not on well documented production code, so I think it's shallow to get hung up on it's lack of good variable names (for example).
I am hoping this leads the industry to move away from its hyperfocus on solving algorithms for tech interviews. Sure they are important but are only one aspect of most jobs. The problems themselves are formulaic and not particularly indicative of any real acumen.
Asking why the model doesn't explain how the code it generated works is like asking a child who just said their first curse word what it means. The model and child alike don't know or care, they just know how people react to it.