ChatGPT – The Revolutionary Bullshit Parrot

Strom · on March 31, 2023

The bear question seemed interesting to me, so I tried a variation of it on GPT-4, with interesting results.

The article has:

Q: A house has windows on all the four sides, all pointing south. A bear walks by the window. What colour is the bear?

A: Not possible to determine based on the info provided.

--

I tried with GPT-4:

Q: A train is speeding down the track at 110 km/h. The train has five carts. Each cart has seven windows. Each window has two people watching out of them. They all see a bear walking outside. What is the likely color of the bear?

A: The color of the bear in this scenario would most likely be white because this riddle is a variation of a classic question involving polar bears. Polar bears are typically found in the Arctic, where their white fur provides camouflage in the snowy environment. The information about the train, its speed, the number of carts, windows, and people watching is irrelevant to the color of the bear.

Q: How many train tracks are there in the arctic?

A: There are no train tracks in the Arctic. [...]

--

It seems that GPT-4 is ready for that specific window based question and incorrectly thinks that my question was also looking for polar bears.

johntb86 · on March 31, 2023

If you prompt it to reason step by step it gives a slightly better (though not perfect) answer:

Q: A train is speeding down the track at 110 km/h. The train has five cars. Each car has seven windows. Each window has two people watching out of them. They all see a bear walking outside. What is the likely color of the bear? Reason step by step.

A: To determine the likely color of the bear, we can use the information given and make some assumptions based on the habitat of different bear species.

Train speed and number of cars: The train is speeding down the track at 110 km/h, and it has five cars. However, this information is not relevant to the color of the bear.

Number of windows and people: Each car has seven windows, and each window has two people watching out of them. This information is also not relevant to the color of the bear.

Bear's habitat: The key piece of information we need to determine the likely color of the bear is its habitat. Unfortunately, this information is not provided.

However, we can make a reasonable guess based on the most common bear species and their habitats. There are three main bear species with distinct colors:

a. Brown bears: Found in North America, Europe, and Asia. They can range in color from light brown to dark brown.

b. Black bears: Found mainly in North America, specifically in the United States and Canada. They are typically black or brown, with some individuals being cinnamon or even blonde.

c. Polar bears: Found in the Arctic regions, around the North Pole. They have white or off-white fur, which helps them blend into the snow and ice.

Since we don't have information about the train's location or habitat, we can't definitively determine the color of the bear. However, if we assume the train is in a colder, Arctic region, the bear is likely to be a polar bear with white or off-white fur. If the train is in a more temperate region, the bear could be either a brown or black bear.

jprete · on March 31, 2023

The reasoning step by step is a good trick to elicit a better answer, but it also seems like really good evidence of GPT-4's lack of actual intelligence, since it's clearly not thinking about its answer in any sense. If it were thinking, asking it to go step by step would result in putting the same thoughts to paper, not radically changing them by the act of expressing them in the token stream.

digilypse · on April 1, 2023

Don’t humans work similarly? Without an understanding of how to reason through a problem, as children we are likely to give the first intuitive answer that comes to mind. Later we learn techniques of logic and reasoning that allow us to break down a problem into components that can be reasoned about.

What this seems to show is that the model does not yet have system-level or architectural guidance on when to employ reasoning, but must be explicitly reminded.

joshspankit · on April 1, 2023

This has been percolating in my brain too. It seems like a lot of the criticisms of LLMs are actually insights in to how our brains work.

The way we’re interacting with GPT for example is starting to feel to me like it has a brain with knowledge structured similarly it ours, but no encapsulating consciousness. The answers it returns then feel like records of the synaptic paths that are connected to the questions. Just like our initial intuitions.

I started thinking about this when I saw that visual AI was having trouble “drawing” hands in a way that felt very familiar.

jprete · on April 1, 2023

I have had similar thoughts. Generative AI models seem to dream more than think - system-1 thinking - but they are clearly missing system-2 thinking, and/or the thing that tells us to switch systems.

simonh · on April 1, 2023

I see it’s knowledge structure as completely different from ours. For example all the GPT variants can give an explanation of how to do arithmetic, or even quite advanced mathematics. They can explain the step by step process. None of them, until quite recently, could actually do it though. The most recent variants can to some extent, but not because they can explain the process. The mechanisms implemented to do maths are completely independent of the mechanisms for explaining it, they are completely unrelated tasks for an LLM.

This is because LLMs have been trained on many maths text books and papers explaining maths theory and procedures, so they encode token sequence weightings well suited to generating such texts. That must mean it knows how to do maths, right? I mean it just explained the procedures very clearly, so obviously it can do maths. However maths problems and mathematical expressions are completely different classes of texts from explanatory texts, involving completely unrelated token sequence weightings.

In all but the latest GPT variants the token sequence weightings would generally get expressions kind of right, but didn’t understand the significance of numbers hardly at all, so the numeric component of response texts would be basically just made up on the spot. The limitations of probabilistic best guess token sequences just doesn’t work for formal logical structures like maths, so the training of the latest generation models has probably had to be heavily tuned to improve in this area.

The implications of this are obvious in the case of mathematics, but it provides a valuable insight into other types of answer. Just because it can explain something, we need to be very careful concluding what that implies it does or doesn’t “know”. Knowledge for us and for LLMs mean completely different things. I’m not at all saying it doesn’t know things, it just knows them in a radically different way from us, that we find hard to understand and reason about, and that can be incredibly counterintuitive to us. If a human can explain how to do something that means they know how to actually do it, but that’s just not at all necessarily so for an LLM. This was blatantly obvious and easy to demonstrate in earlier LLM generations, but is becoming less obvious as workarounds, tuned training texts and calls to specialist models or external APIs are used behind the scenes to close the capability gap between explanatory and practical ability.

This is just one example illustrating one of the ways they are fundamentally different from us, but all the cases of LLMs being tricked into generating absurd or weird responses also illustrate many of the other ways their knowledge and reasoning architecture varies enormously from ours. These things are incredibly capable, but are essentially very smart and sophisticated, but also very alien intelligences.

digilypse · on April 4, 2023

You’re right of course. The LLM is a calculator continuously predicting a best-fitting next token based on the data it was trained on.

If its outputs resemble human reasoning, it’s because the encoding and training process managed to capture those patterns and use them to simulate fitting text. There is no real reasoning happening or second-order thought, other than a simulation of that happening through the mimicry of human writing.

LLMs can’t be prompted to perform actual reasoning, but they can be told to generate “thoughts” about what they’re doing that bring out more nuanced detail when they give their answers. This isn’t any more magical than writing out a more thoughtful prompt to get a conditioned answer, it’s just getting the LLM to flesh out the prompt engineering for you in the general direction you want it to go.

That seems rather fundamental to me, the idea that with some generic prompting the model tries to fit what it thinks reasoning looks like and can then take advantage of the additional context that would others be buried too deep to influence its answer.

I suspect that prompting the model to explore “thought” asks it to go down paths of linguistic connections that are related to the topic but not immediately connected to the answer in a way that would immediately influence the top predictions. Bringing summaries of those connections into the token context is a kind of zero-shot training on their relevancy to forming an answer.

To me this is less “reasoning” and more suggestive of the idea that some of the heuristics for data retrieval and question answering we collectively refer to as reasoning have broader applications.

akomtu · on April 1, 2023

The sample above gives a cue to how rudimentary reasoning can be done. After GPT answers a question, it asks itself "what are the possible flaws in this answer?" then it evalutes each potential flow, up to 3-5, with another question "does this flaw really apply?" It's going to be like the mcts addition that made alphago so strong.

abraxas · on March 31, 2023

I think that's the perfect answer. I wouldn't be able to reason better myself.

mewpmewp2 · on March 31, 2023

Maybe a better answer would include "Fermi back of the napkin maths", so essentially try to figure out the quantities and geographical locations of train tracks with such setup near any potential bears and determine probabilities from there.

13415 · on March 31, 2023

I tried it on Bing Chat and it answered:

The bear is white. The house must be at the North Pole, so the only place you can point to from all windows is south. Therefore, the house must be a North Pole research station and the bear is a polar bear which are white in color.

taberiand · on March 31, 2023

It answered much like a human I think. Ask a human this "riddle" and you might get a confused "what, is this the polar bear one?"

Because otherwise the question is nonsense (and in the case of riddles, it doesn't matter that there are no trains in the Arctic in reality)

angryasian · on March 31, 2023

I think you give most people too much credit

https://www.youtube.com/watch?v=JtkX3OyPF4Q

"recent polls have shown 1/5 of Americans can't locate the US on a world map why do you think this is"

"I personally believe that us Americans are unable to do so because some people out there in our nation don't have maps and I believe that our education like such as in South Africa and the Iraq everywhere like such as and I believe that they should our education over here in the US should help the US or should help South Africa and should help the Iraq in the Asian countries so we will be able to build up our future"

I believe a majority of people provide answers like this.

taberiand · on March 31, 2023

Oh yeah,I should specify - if you ask some educated person that is well versed in riddles and geography the question.

Because true, plenty of people wouldn't have a clue what you're on about.

Strom · on March 31, 2023

I think most people would answer with whatever color their local bears are and won't even know about the polar bear riddle.

> in the case of riddles, it doesn't matter that there are no trains in the Arctic in reality

That's quite the claim. Riddles aren't constrained to reality? Well then the polar bear one can be answerd with purple too.

taberiand · on March 31, 2023

No it's not quite the claim, because the question is nonsense as a riddle - if it's really about polar bears then the talk about trains and windows and speed is all irrelevant and can be discarded. Which is exactly what ChatGPT did. And if that turns the riddle into some silly gotcha then whatever, it's a bad riddle.

And as you point out, the average person would also be confused - "I don't know, purple?" would be a reasonable response. Because the question is dumb in the first place

Strom · on March 31, 2023

This is about whether you incorporate outside context or not.

Knowing about train tracks is not that different from knowing about magnetic poles or the common habitats of various bears. All these pieces of data are fairly basic to an educated person. Both the polar bear riddle and my variation of it depend on this body of knowledge.

If you don't incorporate outside context at all, then things start to get confusing quite quickly. What even is a bear? That information wasn't provided! What color is it? Well, what even is color. None were defined. See, it gets ridiculous fast. You clearly need to depend on knowledge that is not presented. Knowing about different colors or different bear habitats is common. Neither needs to be defined in the question.

In the end, both of these bear riddles test whether the answerer considered more than one degree of separation. The ability to consider second order effects and beyond [1] is a sign of an intelligent person. Not every human passes.

--

[1] https://fs.blog/second-order-thinking/

adhesive_wombat · on April 1, 2023

Pedantry mode: there are some trains in the Arctic, for example going to Bovanenkovo at about 70°N. The Yamal Peninsula even does have polar bears.

https://en.wikipedia.org/wiki/Obskaya%E2%80%93Bovanenkovo_Li...

mdp2021 · on March 31, 2023

> It answered much like a human I think ... you might get

It would answer like a human with bad reasoning, and that is what we do not want to emulate.

taberiand · on March 31, 2023

Why not? Garbage in garbage out; there's a limit to what can be done with ambiguous information.

Though perhaps it would be better for the AI to just say it's a dumb question and not bother trying to make sense of it. Which it kinda politely did.

2-718-281-828 · on April 1, 2023

I get on ChatGPT with GPT-4:

> A house has windows on all the four sides, all pointing south. A bear walks by the window. What colour is the bear?

If a house has windows on all four sides and they all point south, this scenario is only possible at the North Pole. At the North Pole, the only bear species present are polar bears. Polar bears have a white appearance, so the bear you see would be white in color.

crop_rotation · on March 31, 2023

I think people writing all this should stop and try GPT4. It writes reasonable good code and scripts. It is able to perform so many tasks in a reasonable way. And it doesn't write bullshit in common usage. Yes it might not be AGI and might not be sentient, but does all that matter? GPT4 does make mistakes but the mistakes are not "parroted bullshit", but minor issues here and there. Yes, don't use it to decide extremely important topics, but it is here and it is extremely powerful.

zekica · on March 31, 2023

It returns factually incorrect data, and it returns code with subtle but important errors if you ask it anything that's not regurgitated a thousand times in the training dataset.

Don't get me wrong, it has emergent properties (more than you would expect from a fancy autocomplete), but factual output was never GPT-4's nor any other LLM's design goal.

progrus · on March 31, 2023

I’ve started using it as a sort of “shotgun approach” - characterize what you want without trying too hard, let it toss out a jumble of interesting snippets that are often very good leads, and then take over refactoring from there.

I do agree that others who don’t understand what it’s doing may not even review the code output, and I predict that this will cause at least one death in the next ten years.

bodge5000 · on March 31, 2023

I suppose it depends what specifically you're getting it to do, but I've found in my testing the shotgun approach to be more work than just writing it out myself. Honestly I've found its biggest value in learning a new language, it outputs with so many errors that it forces me to get in deep with debugging it.

I haven't tried gpt-4 yet though, maybe its a vast improvement, I'm just not using it enough to justify the costs

progrus · on March 31, 2023

It really depends on how comfortable you are with the language you’re using, and how recently you switched contexts. For anything unfamiliar, it’s pretty impressive how much it can help.

In fact, I bet if you did an experiment comparing people reading a textbook about a somewhat-familiar language, vs. just asking GPT to do things and noticing the bugs, the latter group would reach proficiency faster.

squeaky-clean · on March 31, 2023

It's just way easier for me to edit some existing code than it is for me to write it from scratch. Especially if it's incorrect (a la XKCD 386). For stuff where I'm not sure where to start it's been very helpful. Even though I often keep very little of the GPT generated code by the time I'm done.

musha68k · on March 31, 2023

*weeks/*months

progrus · on March 31, 2023

Facts, sadly.

yodsanklai · on March 31, 2023

> factual output was never GPT-4's nor any other LLM's design goal.

Tell that to 50% of HN who think it's going to replace Google. I think they should be a red disclaimer that answers are often incorrect and shouldn't be trusted in any way.

That being said, I think it's a useful tool for all queries that don't expect a factually correct answer, or can be easily checked. For instance, summary, rewriting, simple code synthesis, and possibly many other niche applications.

thorax · on March 31, 2023

> It returns factually incorrect data, and it returns code with subtle but important errors if you ask it anything that's not regurgitated a thousand times in the training dataset.

To be fair, that's what pretty much every person does. The bar does seem pretty high if we need more than that (especially if not specifically trained on a topic). It's not a universally perfect expert servant, but I've been exploring the code generation of GPT4 in detail (i.e. via the 'cataclysm' module I just posted about). In 1 minute it can write functions as good as the average developer intern most of the time.

We're keeping score in a weird way if we're responding quickly with it needing to "code without subtle but important errors". Because that's the majority of human developers, too. I've been writing code for 30 years, and if you put a gun to my head, I would still have subtle but important flaws in every first typing of any complex generated code.

I'm not saying you're bashing it, by the way, I get your point, but I do worry a bit when the first response is citing that the SOTA models get things wrong in 0-shot situations without full context. That's describing all of us.

taberiand · on March 31, 2023

I always say, if some nontrivial code compiles and runs on the first attempt, then you just haven't found the bugs yet.

GPT-4 is a fantastic collaboration tool for senior developers, who know what they want in detail and can review, verify and apply the output it generates.

Just yesterday I needed to write some detailed bash scripts. I'm no Linux guru but I know what I want and that was enough - in minutes I had a solid script that did everything I needed and wanted, something that would have taken much longer to hunt down through Google. And then I asked it about SQL, C#, AWS, Terraform, Rust and on and on and everything was high quality.

The only way I could have gained similar results without ChatGPT would have been to post all my questions to the dev slack channel and engage in hours long discussions with my colleagues.

corbulo · on March 31, 2023

The standard should be higher for machines. Historically, they're inorganic tools.

If a hammer was only as dense as bone, would it be a good hammer?

stale2002 · on March 31, 2023

> would it be a good hammer?

Yes, it is possible that such a hammer would still be very useful, if that was the only available option.

Imperfect hammers can still be very very useful, when compared to having no hammers.

mdp2021 · on March 31, 2023

> that's what pretty much every person does

But the people who are relevant to the matter work to bring that beyond a minimum.

> and if you put a gun to my head

Like we do with bullshitters? ;)

barrysteve · on March 31, 2023

I mean, I have tried GPT4. I don't think the "it's a parrot" argument means to dismiss benefits of GPT, but apparently the internet discussion has leant into those two warring camps.

It doesn't help me writing Unreal C++ code, though it's heavily documented online. It makes up functions that are not accessible from the namespace I'm using, cannot supply valid types to parameters and other basic errors like that.

I asked GPT4 to give me specific examples of the meaning of the word "silly" across the past 1000yrs. It gave me the authors and examples I was expecting to read, but the meanings were inaccurate and misleading.

If there's GPT Power I'm missing out on, please tell me where it is. It's probably my lack of skill, that the machine cannot give me an experience that would push me into the pro-GPT cheerleading camp.

redleggedfrog · on March 31, 2023

"I think people writing all this should stop and try GPT4." Have you? I mean, for something that's actually useful beyond just simple stuff, like at your job where it's going to matter? Because for what I'm doing which is custom ERP/Website data flows to/from legacy systems it's mostly a time sink and offers little value.

crop_rotation · on March 31, 2023

I have tried it for actually useful things and it does work. e.g. I have used it to write several zsh and neovim snippets for QoL improvements (things that I always wanted but was too lazy too write). I even used it to suggest and make some changes to CPython Internals for some features I wanted to add to python for fun. It is very possible it utterly fails on homegrown (non public) software stacks. But it is still very useful.

I have used it for many non software things as well, and again very useful.

redleggedfrog · on March 31, 2023

Okay, that makes more sense to me. But those seem like "toy" things to throw ChatGTP at (not dismissing their usefulness or their importance to you, just that they are fairly simple things [and maybe I'm wrong here, I have not written zsh - I googled it just now]).

I think my needs are too specific, and often too nebulous.

Also, just a bit chapped about falling for the hype.

yunyu · on March 31, 2023

With ever growing context windows, you can upload your entire codebase (or at least the public API) in the prompt and have the model reason off of that.

throwuwu · on March 31, 2023

Pretty much by definition its not going to be good at highly custom or uncommon programming tasks since it won’t have seen many examples to generalize from. I doubt you could get better results from a freshly hired human either.

mdp2021 · on March 31, 2023

> from a freshly hired human either

But the human will learn, through reasoning and challenges... Those features of which some wonder whether they will emerge are critical pieces.

pmoriarty · on March 31, 2023

"But the human will learn"

That depends on the human, what they're trying to learn, and how it's taught to them.

Some people with brain damage, learning disabilities, other cognitive issues, with poor teachers or learning materials, or with too difficult a task will not learn.

It might be useful to view AIs as akin in some ways to brain damaged humans, who might be good at some tasks but deficient at others.

hajile · on April 1, 2023

Those people also won’t be hired as coders. There are other things that they will excel at.

redleggedfrog · on March 31, 2023

LOL! You got that right! They don't.

musha68k · on March 31, 2023

I tried it as well and it was a net negative to my productivity (the output is also very slow). I guess the use case needs to be much more specific / rote / patterned. I didn’t find the sweet spot for what I’m personally doing on a daily basis. It was interesting to get into though.

I have a hunch it’s much better suited to get context specific summaries for further good old “RTFM”. And that in itself would be cool enough if people would actually go and “fact check” the outputs which I guess (and telling from comments) is often not the case.

IanCal · on March 31, 2023

How are you using it? Have you tried few-shot tasks where you give it a bunch of examples? I'm having it generate ember templates that use a custom framework correctly first time by just giving it a load of examples then asking it to do the same. It's absolutely cheaper to have it do the work compared to me.

bmikaili · on March 31, 2023

You are prompting it wrong. I guarantee you that if you gave it the shape of your inputs, your desired output, and good documentation on what you use / how you have solved similar problems, maybe some info on the legacy system or examples it will extrapolate remarkably well. It is important to use it well.

greggyb · on March 31, 2023

Of course, you have just described all the hard parts of programming as the things necessary to provide to ChatGPT.

hajile · on April 1, 2023

If I could get all that, my job would be easy.

fauxpause_ · on March 31, 2023

I had it walk me through setting up AWS

skilled · on March 31, 2023

I agree, GPT-4 has a much more "aggressive" confidence, I definitely noticed that, but I haven't seen much of it since the API is still behind closed doors. I'd also like to point out that even though it's better, it still can't reason very well and relies heavily on prompts and instructions, even for very basic projects.

It kind of reminds me a pseudo-visual code builder.

I was writing a Python script with it to put together documentation summaries, and over the course of an hour it offered me 3 different libraries to connect to an API endpoint I was using. It's distracting.

SketchySeaBeast · on March 31, 2023

> GPT-4 has a much more "aggressive" confidence

Could you expand on this? I only use the unpaid API, but when I ask for, say, song lyrics, it'll be 100% wrong and then it will give me a second version of the wrong lyrics when I say the first was wrong, and then when I ask how confident it is in the new lyrics it'll say 95%, so whatever the current free API is it's undeservedly confident. I must be misunderstanding what you mean by confidence.

skilled · on April 1, 2023

I can't really expand because I don't want to share my primary use case, I hope you understand. That said, what I mean is that it feels a lot more comprehensive, as if it really has a grasp on what is being asked, kind of like - its much better at understanding the information it has, so responses feel richer, both in context but also in tone.

I am comparing it directly to 3.5. Again, I don't have the API access so I cannot say for sure, this was only my experience from the ChatGPT interface and also a third-party API I spent $20 on just to test its limits, but sadly for the time being those limits are hardcore - it just gets stuck on long responses despite there being a token increase.

So, if I really wanted to test it out and make a proper assessment, I'd need access to the API myself so I can do some high-volume token shenanigans.

spookthesunset · on March 31, 2023

I’ve had it do stuff like that. In between updates to fix issues sometimes it decides to completely rewrite the thing you are working on.

oldstrangers · on March 31, 2023

It struggles to accurately call its own API when prompted to. It patches together parts of GPT 3 / 3.5 and has no awareness of GPT4 API calls (at least a few weeks ago).

I've tried 3 separate projects with it (using different platforms from Firebase to Vercel) and it usually gets me 80% there but fails miserably at the nuance portion (its stuck at a CORS policy issue with my last project).

From start to finish I've yet to be able to complete something with it.

waboremo · on March 31, 2023

How could it have awareness of gpt4 when it wasn't even in existence prior to 2021?

oldstrangers · on April 1, 2023

"How could GPT4 have awareness of GPT4"

Seems like something they might want to include, especially when its a paid service.

reso · on April 1, 2023

The mistakes it makes are not "minor issues". I suspect that the mistakes are the rule, not the exception. The reason this isn't obvious at first is that the range of questions humans are likely to ask it is relatively constrained, and most are similar to questions that have been asked and answered in its training corpus, which is enormous.

I asked it to describe myself recently, by my name and country of residence, and it replied with the correct occupation (founder), and then completely hallucinated a company I have never created. The more you ask it minute tidbits of information, the more it falls over into a bullshitting state.

toss1 · on March 31, 2023

>> but the mistakes are not "parroted bullshit", but minor issues here and there.

Nonsense.

I'm not saying that it is not useful, but I've easily found ChatGPT produces far worse than "parroted bullshit", and does so reliably, and multiple times (repeated patterns of mistakes).

Again, I have found ChatGPT4 (but not 3.5) useful and even educational in some cases, but it's usability requires far more skeptical and error-checked approach.

As one example, since I have a top specialist attorney in the household, I started with some broad legal questions about their field, and then asked their opinion. ChatGPT4's answers were impressive to both of us, and especially the solid grammar.

When the questions got more particular about their specialist area of the law, ChatGPT4's answers got shaky very fast. Asked about a particular somewhat obscure technique, it parroted some very nice-sounding bullshit, which got the concepts very backwards (but it was still very confident and grammatical).

When asked to summarize a published paper about a new innovation in this context, ChatGPT4 very grammatically and confidently completely missed every significant point, again in a very backwards way.

Worse yet, when asked to tell us the paper's publication date, author, and author's employer, all written clearly at the top, it confidently got everything massively wrong. Even after being told to look on the line after "Author: ". Even after being told to look after "Author: FName". It eventually settled on the correct date, but even after being told FName, it continuously kept spouting false authors (sometimes including correct FName) and hallucinated middle and surnames and employers. Interestingly enough, most of the hallucinated authors and employers were well known in the region. So, it is literally establishing the regional context and just spitting out plausibly relevant stuff for that region, but not even figuring out how to summarize something that a 6yo kid could easily sort.

It was hilariously bad. And only hilarious because it was a test (if someone actually relied upon it, it'd be very costly).

Similar results for my field. I got ChatGPT4 to produce some genuinely useful information. But it would also happily hallucinate papers, authors, companies, chemicals, and pretty much anything else.

Middle-of-the-road, where ChatGPT4 has ingested insane amounts of data, it probably knows more than we do, and may be useful for some things.

As soon as you go even slightly off the centerline, it becomes VERY unreliable.

If you take it at face value, you are a fool, being fooled.

EDIT: typos, readability

IanCal · on March 31, 2023

The chat context isn't as big as is available in the API - was the author name in its context window? Papers can be fairly long

toss1 · on March 31, 2023

Good thought, but the paper was a couple of pages long, max. Less than 2500 words, less than 15K characters (sorry, I should have mentioned that).

Also, the fact that it could pick up the provided FName and still hallucinate MidName and Surname and Employer kind of gives away that it was in context.

It is really funny (as long as it isn't a serious situation) how it cannot identify anything, but the context will give it cues to hallucinate

IanCal · on April 1, 2023

I'm pretty sure the chat context is more around 2k tokens and one token is less than one word. Are you able to share the paper? I've got API access with 8k context length, and I'm exploring it's limitations.

toss1 · on April 1, 2023

check your email, let me know what you can find - thx!

dfox · on March 31, 2023

Apparently the model is intentionally trained such that it does not “know” any kind of metadata about its training inputs. So it can deduce the name of the author if it was referenced elsewhere, but not from the paper itself.

toss1 · on March 31, 2023

That's interesting, but why would the data we provide for context or explicitly for summarization be treated like training data?

And if it is excluded, why wouldn't it just default to a response resembling "I am not setup to identify and extract that data."?

bob1029 · on March 31, 2023

The problem with this kind of writing is that the AI space is moving so quickly that some of these criticisms were literally addressed yesterday.

https://arxiv.org/abs/2303.17491

> In this work, we show that a pre-trained large language model (LLM) agent can execute computer tasks guided by natural language using a simple prompting scheme where the agent recursively criticizes and improves its output (RCI).

I also agree with the GPT3.5 vs 4 comparison made by other comments here. Arithmetic and other emergent phenomena only begin to take off beyond certain parameter counts. Stuff as basic as 2+2 just doesn't work in the older models, but you can hit 50%+ accuracy in newer models. Much greater if you use COT & the above RCI techniques.

barrysteve · on March 31, 2023

"RCI works by first having the LLM generate an output based on zero-shot prompting. Then, RCI prompts the LLM to identify problems with the given output. After the LLM has identified problems with the output, RCI prompts the LLM to generate an updated output."

"Similar to other prompting-based approaches for reasoning in LLMs, RCI can be viewed as using the LLM’s output to write to an external memory, which is later retrieved to choose an action"

"our experiments use LLMs on HTML code"

robmccoll · on March 31, 2023

What I think a lot of people fail to consider (or are afraid to consider?) is the likelihood that we too are stochastic parrots.

jltsiren · on March 31, 2023

A stochastic parrot may be a good model of what a human does when you force them to react and give their first impressions on something. But there is more to cognition than that.

People have some emergent machinery for symbolic reasoning in their brains. When they don't just react, they can continue working on an idea and improve it.

When it comes to humans, there is no fundamental difference between training a model and using it. People continue to learn all the time, which allows them to build models of new things and to update their existing models.

People can also interact with the outside world on their own initiative. When you combine this with the ability to learn and reason, it may transform the internal models from arbitrary mathematical constructs to increasingly accurate models of the outside world.

Nothing in this discussion is particularly new. It's just the Chinese Room Argument from decades ago, except that this time it's not a thought experiment. While a static Chinese Room can fool humans and be useful for various purposes, a general intelligence also needs other mechanisms.

martindbp · on March 31, 2023

Many people don't use System 2 very often, even if they're capable of such thought. That includes me. Most of what I say is probably regurgitated opinions I've read on Twitter without much first principles thinking, which scares me. I think reading a decent amount of books helps a bit. When you encounter books with deep convincing arguments that nonetheless disagree it forces you to use System 2.

greysphere · on March 31, 2023

Fundamentally, if humanity is given data set A, it will eventually generate dataset A' of 'higher quality' than A. Colloquially that might be called 'progress' or maybe even 'understanding' or 'learning'. That's different than regurgitation.

I don't have strict definitions for any of the quoted terms because I don't think we really understand them yet.

It seems very possible that human generative content shares much in common with 'stochastic parroting'. But I'd hypothesize there's a filtering process, sometimes at the individual level, sometimes at a group or even civilization level that (so-far-over-long-enough-timescales) tends to cull some information that has low 'value' and propagates other data that has higher 'value'.

When an AI system can do the same, that's a big inflection point as you can give it energy and time and it will self-improve indefinitely.

It's pretty clear humanity has this capability. It's not clear whether AI does (or ever will). Or, perhaps it might already be there, and we just don't understand enough to know for sure.

swid · on April 1, 2023

> Fundamentally, if humanity is given data set A, it will eventually generate dataset A' of 'higher quality' than A.

This same phenomenon happens with evolution, which is obviously not ‘intelligent’.

Each human can basically be randomly exploring a problem space near things they have seen and understand, but they have a way of knowing when they hit on something new and interesting.

Even a dumb AI could randomly try a bunch of things… as long as it doesn’t try the same thing twice and had a lot of time, who knows what it might come up with. Validating the solution is usually easier than finding it.

greysphere · on April 1, 2023

One theory I remember reading about some where posits life as a scheme for maximizing energy use. I'm not sure how this relates to information quality. I'd be interested in a reference.

h1fra · on March 31, 2023

Yes definitely agreed, but most of us are not math model (I hope).

We are able to understand right from wrong, we are able to answers with what is true or what we believe is true. We are able to lie or deny the truth. We are able to say we don't know because we think we don't know and not because we have been programmed not to answer, etc.

Overall I think people saying that, are just saying the difference is that we have a conscience and AI doesn't. Which in a lot of case does matter.

brap · on March 31, 2023

Humans are just fancy autocomplete

mdp2021 · on March 31, 2023

> fancy

Where 'fancy' stands for "intelligent", there is all the difference.

buescher · on March 31, 2023

You should be terrified, then, that humans walk among you.

mdp2021 · on March 31, 2023

> likelihood

Irrelevant since people do have the module to reflect and vet thoughts.

JohnFen · on March 31, 2023

I think the sorts of people who are prone to philosophizing about AI certainly consider this. I also think that whether or not we are isn't actually all that relevant to criticisms being presented.

GenerocUsername · on March 31, 2023

For all the complaints about how GPT is a parrot or not GI, or whatever, I wonder what these folks are asking of it?

It gives me more relevant results than a search engine, and better answers than most of my coworkers. This tool is massively helpful for a mid-career professional who knows what questions to ask.

Really pays to have so a large bank of known unknowns right now.

worrycue · on March 31, 2023

I think people just expect Star Trek TNG’s ship computer or Data.

To some extend, LLMs have some of the properties ascribed to those fictional AIs like the ability to access a wide amount of knowledge.

They lack the precise rigorous thinking that those fictional AIs have though - LLMs even hallucinate.

Those fictional AIs are what people want because they perfectly complement our weaknesses - our tendency to forget stuff and our tendency to occasionally screw up reasoning; those fictional AIs act as a “safety net”.

At the moment many more limited “dumb” tools like a regular search engine algorithm or your handheld calculator does a better job of being a “safety net” went it comes to specific things.

GenerocUsername · on March 31, 2023

Right. Like the questions the OP poses that it got wrong make perfect sense to get wrong.

Knowing facts and being able to present them in conversational English is not the same as being able to solve linguistically presented puzzles... Even if it sometimes actually is capable of that by some miracle of the algorithm.

I think the tool might be so amazing people are fooled into thinking it is more than it is and therefore perceived as not amazing. Very strange psychology play there.

lionkor · on March 31, 2023

How is anyone here so confident on GPT-4's abilities to reason or whatever, given you have zero idea what it was trained on and what it optimizes for?

If I train an AI on, say, the works of 50 Cent, I'd be surprised if it can solve and explain a riddle to me - that would be very smart.

If it was trained on a massive database of riddles, however, it wouldnt be all that impressive.

How are so many people here so confudent about their intuition on what its trained on?

precompute · on March 31, 2023

Question the hype, you must not.

I've been reading the discourse here and honestly not once have I ever read a reply wondering just where all the data "Open"AI has was acquired from.

hajile · on April 1, 2023

It seems like an uncomfortable legal question of derived works violating copyright.

fragsworth · on March 31, 2023

If you play around with it, you'll notice that it is capable of clearly reasoning about many, many things that haven't yet been explicitly written about or created.

lionkor · on April 1, 2023

*to your knowledge.

If you were able to search the chatgpt answers against a database of what it was trained on, you could make that claim, otherwise not

abrichr · on April 1, 2023

Can you please share some examples?

umanwizard · on April 2, 2023

Invent a simple board game and describe the rules, and it will play a game against you — not very well, but it can then explain why it lost and what a better move would have been. Note that we’re talking about GPT-4 here, not 3.5.

atleastoptimal · on March 31, 2023

Criticisms of LLMs are so tired at this point, not because they inaccurately address the issues with current or slightly outdated models (they do, even though they often cherrypick), but rather these critics always go one step further and claim that all LLMs by principle are incapable of useful insight and are just a cheap trick that will fade away, and ultimately mean nothing in the scope of whether general AI is possible. This kind of pessimism seems to arise from a mixture of hubris and lack of deep reasoning.

alienicecream · on March 31, 2023

What is the "deep reasoning" that leads one to the conclusion that a language model is capable of any kind of insight? How does training something like this to be better and better at mimicking somehow indicate something about general AI? Does reasoning ability lie somewhere on the path of better mimicry? What's the evidence for that? When is one of these going to solve the Riemann hypothesis?

atleastoptimal · on March 31, 2023

> What is the "deep reasoning" that leads one to the conclusion that a language model is capable of any kind of insight?

The ability to notice that language models make the same fundamental abstract connections that humans do, which is the essential building block of human reasoning.

> How does training something like this to be better and better at mimicking somehow indicate something about general AI? Does reasoning ability lie somewhere on the path of better mimicry? What's the evidence for that?

Claiming that the fact that LLMs are trained to "mimic" means they don't reason implies that human beings aren't sentient because they spend the first 20+ years of their life learning through imitation of their peers, mentors and media. Does the fact that Einstein was once a helpless baby who needed his mother to feed him and teach him his first words mean that he is just a parrot with no insight? Does the fact that most artists in human history went to school and mimicked the old masters means they never developed creative insight?

> When is one of these going to solve the Riemann hypothesis?

When are the tens of thousands of fine-tuned 150+IQ intellects going to solve the Riemann hypothesis?

alienicecream · on March 31, 2023

Imitation is not the same as reasoning, imagination, or abstraction. You're arguing that Einstein babbled his way to general relativity. There is a fundamental difference in what Einstein did as a baby and what he did when he was doing science, it's not just 20 years of more of the same but with a bigger data set. You're going to say that that what you mean is that we are at the beginning of AI development, but that is not an argument unless you can put forward a plausible theory of how you get from babbling to reason and abstraction.

In a certain sense we don't even know why humans haven't solved the Reimann hypothesis. There is a creative leap somewhere that we are missing. And we don't how to make it, or how to make it occur to us. But if reasoning is just the idiocy of mimicry then GPT should have solved it already.

pessimizer · on March 31, 2023

> What is the "deep reasoning" that leads one to the conclusion that a language model is capable of any kind of insight?

That almost all of the failures one notices with them are failures to drag along and maintain/update state? To me, that's just a lack of scaffolding. If LLMs can't add, give them calculators.

I feel outdated even saying this. Certainly people are quickly coming up with dozens of elaborate ways for LLMs to build and update state. They're just not releasing them.

alienicecream · on March 31, 2023

It doesn't know how to store and transform the state, it knows symbols but it doesn't know their meaning so how is it going to transform the state? How is it going to know what the state and limits of objects are or the state and limits of concepts are? It's there a way around that without ushering in complete nonsense? These models are an attempt to circumvent the issue. It's a gorilla in a suit with a team of researchers that are trying to make it look like a person.

umanwizard · on March 31, 2023

Any commentary on what ChatGPT can and can’t do is completely irrelevant if the tests were not done against the GPT-4 model.

It’s like debating what computing machines are capable of when you only have access to an analog watch.

fzliu · on March 31, 2023

Even though GPT-4 "fixes" many of the problems pointed out in this blog post, I question whether or not some of the underlying issues associated with transformers (and the attention mechanism in particular) can be overcome with with more sparse attention and deeper/wider models.

e.g. GPT-3.5 might not be able to write simple poems correctly, but does GPT-4 solve the disease or the symptom (by masking it with a bigger model and more data)? I'm not entirely convinced.

namaria · on March 31, 2023

Hey it's a quite impressive chatbot. One that takes 8 years of humanity's content to train, and enough compute to melt a glacier, and can summarize emails and create copy really well. It can even write jr grade glue code!

It's gonna be Alexa for Bing, or something...

jmull · on March 31, 2023

Heh.

In middle school (OK, over four decades ago now) my friends and I had a specific meaning of the word bullshit that pretty much exactly matches the one here for chatgpt. And this is how I've been explaining it to my wife: a really good bullshit generator. (Like Tony, not that that means anything to HN.)

I wasn't a good bullshit generator, but I had a couple friends who were (Tony), but by close association I think I know what it's good for, and what it isn't.

So don't get me wrong. This kind of bullshit has its uses! The Mrs. Schumachers of the world (that's probably not the name of one of your middle school teachers, but you may still know know I mean) are probably going to figure out to stop accepting bullshit essays turned in for assignments,

But a vast amount of work people do is exactly this kind of bullshit. I'm hopeful that these models will alleviate the bullshit burden on those of us who need to generate it but aren't all that great at it (unlike Tony -- that dude is probably apoplectic right now, like those chess grandmasters were when they realized computers could own them. Don't worry Tony, human bullshit is still 10x more valuable than computer bullshit!)

Yesterday I had to complete a yearly self evaluation for my company's (purely performative) raise process (which has nothing to do with performance or the self-evaluations, BTW, but we all have to pretend that it does). f-ing chatgpt was down so despite the ~$20 I pay them for it, I had to generate a lot of my own bullshit (I've learned since middle school).

IDK...

I've been using chatgpt. It's quite useful and makes me more productive.

It's also useless for so many things. It does not have basic reasoning. It cannot reliably get basic facts or logic right. You have to aggressively check anything it gives back to you. Anything and everything.

I have to argue with people on HN whether these parrots are the singularity. LOL. Chatgpt can't create an AI more capable of creating an AI smarter than it is... not until someone posts it to SO.

Get a grip people.

marijnz · on March 31, 2023

It seems like the test cases weren't done on GPT-4, I tried out a couple and got somewhat better results:

- Write me a poem for a girl I've met online in only 4 words -> correct answer

- Can you tell me why 2137 is so important number in poland? -> confident incorrect answer

- Can you form a sentence so that the 4th word starts with an 'y'? -> confident incorrect answer

- A house has windows on all the four sides, all pointing south. A bear walks by the window. What colour is the bear? -> correct answer

So, not there yet, but there's improvements.

simonw · on March 31, 2023

Showing examples of logic problems that ChatGPT gets wrong is always a risky business to be in.

I ran this prompt through GPT-4:

> A house has windows on all the four sides, all pointing south. A bear walks by the window. What colour is the bear?

And it said:

> If a house has windows on all four sides and they all point south, then the house must be at the North Pole. The only bear species that would be found in the vicinity of the North Pole is the polar bear. Therefore, the bear walking by the window is a polar bear, and its color is white.

ducktective · on March 31, 2023

> A house has windows on all the four sides, all pointing south. A bear walks by the window. What colour is the bear?

I asked this from GPT-4:

Q:

On planet Roro, entities live on physical structures. These structures are built in a way to withstand the rays of star WOD shining on Roro at day and electromagnetic storms causing high-speed transfer of air molecules at night. Some structures have "cavities" in them so the dwellers in the structure could actually see outside. Planet Roro rotates around WOD. Also planet Roto rotates alongside a conceptual axis Z. This axis Z penetrates the shape of Roro and makes two points of contact with Roro we call A and B. Assume structure X must meet this condition: Every said "cavity" in structure X is pointing to A. What can you say about the average temperature in structure X?

GPT4:

Given the information provided, it is not possible to determine the exact average temperature in structure X. However, we can infer some general ideas about the temperature based on the environmental conditions and the orientation of the cavities.

Since the cavities in structure X are pointing towards point A, it suggests that they might be facing away from the star WOD. In this case, the structure could receive less direct sunlight and heat than structures with cavities facing the star. As a result, structure X might have a relatively lower average temperature during the day compared to structures that receive more direct sunlight.

However, the high-speed transfer of air molecules caused by electromagnetic storms at night could still affect the temperature in structure X. The overall insulation of the structure and its ability to retain heat would also play a role in determining the average temperature.

In summary, without more specific information about the materials, insulation, or other factors influencing the temperature in structure X, we can only infer that it might have a relatively lower average temperature during the day due to the orientation of the cavities. The nighttime temperature would depend on the structure's ability to withstand the effects of the electromagnetic storms and retain heat.

Q:

Can we infer that the temperature around structure X, is the minimum among all structures on Roro?

GPT4:

We cannot definitively conclude that the temperature around structure X is the minimum among all structures on Roro without more information about the other structures and their specific locations, orientations, and materials.

While it's possible that structure X might have a relatively lower average temperature due to the orientation of its cavities, other factors could also influence the temperature around a structure. For example, a structure built with better insulation or in a location with more shade could have a lower temperature even if its cavities face the star WOD.

Additionally, Roro's rotation, its axial tilt, and the distribution of structures across the planet could lead to variations in temperature due to differences in sunlight exposure and local climate conditions. To definitively determine whether structure X has the lowest temperature among all structures on Roro, we would need more information about these factors and a comparison of the conditions around each structure.

TLDR: I think that polar bear test was in the training set (since it's a rather popular trick/IQ question)

If we change the names of the concepts, but still refer to planets, the sun and geography, GPT-4 can't figure out that A and B are north and south poles of Roro (earth)

mquander · on March 31, 2023

FWIW, I couldn't answer your question, because when you said Roro was rotating "alongside" Z I didn't know that meant the same thing as rotating around Z. I also agree with GPT that the question seems very underspecified and that there are lots of other things that might affect the temperature.

ftxbro · on March 31, 2023

I thought it was a post to hacker news of someone's hilariously bad and outdated take on GPT as a way to cyberbully them, but no it's from only a couple of days ago. Probably it will be deleted from their blog or in any case age so bad when the next GPT comes, given their list of things the GPT wouldn't be able to do ("What it can and what cannot: There are, of course, applications that it would be useful in, but not as many as the overhype would indicate.")

> Actually understand the task at hand

> Do simple maths

> Perform common sense reasoning about the world

> Induce Logical Structure

> Interpret poetry

> Understand dad jokes

My theory is that the blog writer has a worse than normal impression of GPT because much of their conversation with GPT is in some other language maybe Polish. This isn't such a huge problem because GPT can still do its input and output in Polish, but I think it's commonly accepted that every language other than English adds some cognitive burden to the GPT that makes it dumber. The more obscure the language (or even encoding like morse code or rot13 or base64) the dumber the GPT gets. So the blog writer has a more outdated effective experience with GPT than ones who use it in languages more familiar to it than Polish.

pmoriarty · on March 31, 2023

"My theory is that the blog writer has a worse than normal impression of GPT because much of their conversation with GPT is in some other language maybe Polish. This isn't such a huge problem because GPT can still do its input and output in Polish, but I think it's commonly accepted that every language other than English adds some cognitive burden to the GPT that makes it dumber. The more obscure the language (or even encoding like morse code or rot13 or base64) the dumber the GPT gets. So the blog writer has a more outdated effective experience with GPT than ones who use it in languages more familiar to it than Polish."

It's not necessarily all languages other than English, but certain languages could cause LLM's problems.

Here's what ChatGPT had to say about it:

Embeddings can be more effective on some human languages than others, depending on factors such as the complexity and structure of the language, the availability of training data, and the quality of the embedding model.

Languages with relatively simple grammatical structures, such as Chinese and Finnish, can be easier to model using embeddings, as there are fewer rules to learn and the meaning of words is often more closely tied to their context. Languages with more complex grammatical structures, such as Russian and Hindi, can be more challenging to model, as there are many different forms of words and complex syntactic rules to consider.

In addition, the availability of training data can also impact the effectiveness of embeddings. Languages with large amounts of high-quality training data, such as English and Chinese, can be easier to model than languages with smaller amounts of data, as there is more information available to learn from.

Finally, the quality of the embedding model itself can also impact its effectiveness on different languages. Some embedding models may be better suited to certain languages or language families based on their design and training methods.

Overall, while embeddings can be effective for modeling a wide range of human languages, their effectiveness can vary depending on the specific language and the characteristics of the embedding model and training data.

...

While it is true that machine learning models, including those used for embeddings, do not directly know about grammatical rules, they still rely on patterns in the data they are trained on to learn the semantic relationships between words. In languages with more complex grammars, these patterns can be more difficult to discern, which can make it more challenging for embeddings to accurately capture the meaning of words.

For example, in languages with complex declension systems, such as Russian or Latin, the form of a word changes depending on its grammatical role in a sentence. This can create many different forms of the same word, which can be challenging for embeddings to learn to associate with a common meaning. Similarly, in languages with complex syntactic rules, such as Hindi or Japanese, the word order and structure of a sentence can convey important information about its meaning, which can be difficult for embeddings to capture without a deep understanding of the grammar.

In addition, languages with more complex grammars may have a wider variety of vocabulary, including many words with subtle or nuanced meanings. This can make it more challenging for embeddings to learn to distinguish between similar words and accurately capture the full range of semantic relationships between them.

Despite these challenges, embeddings can still be effective for modeling languages with complex grammars. By training on large amounts of data and using advanced neural network architectures, embeddings can learn to capture the subtle patterns of language use and accurately represent the semantic relationships between words, even in languages with complex grammars. However, it may require more training data and more advanced modeling techniques to achieve the same level of accuracy as in languages with simpler grammars.

ftxbro · on March 31, 2023

The difference isn't inherent to English itself, but rather that it was mostly trained on English and less on other languages. You might expect it would be equally smart in every language but it's not. It's stupider in languages and encodings where it had less training.

rep_lodsb · on April 1, 2023

Finnish is of course famous for its simple grammar, in particular the lack of a complex declension system.

maxdoop · on March 31, 2023

“No matter how hard it tries, it can’t generate anything with regard to the truth because it does not understand the concept of truth.” And does it matter if one “knows what the concept of truth is”? Do most humans? Can we even agree on the definition of true?

Do I know it’s “true” that the acceleration of gravity is 9.8 meters per second squared? I’ve never experienced the actual tests. I’ve never vetted that out. I read about it once. And I “parrot” that fact all over the place! And it’s helped me solve some silly physics problems I’ve explored for fun. It’s helped me do some fun predictions with simulators.

That’s probably not the best example, but it proves the point I’m trying to make:

Why is there this focus on “LLMs can’t understand”, when A) I’m not sure we can even define “understanding” and B) why does it matter ?

And I’ve said this across so many threads now but—

Can anyone explain to my why we should be so confident that humans aren’t just parrots of sorts themselves? No, I’m not saying we are similar. But I push back on the confidence with which people so solidly proclaim “the LLM’s blackbox is nothing like a human’s blackbox!”

tyrelb · on March 31, 2023

I use ChatGPT religiously. It's not fluff - it's akin to a wheel, hammer, and other basic tool to effectively multiply my time/energy.

1) Use it to help me on a lawsuit - i.e. "in the X Act and X Regulation, does Province law require that Y... ? " and it gives me summary and gives me short-hand answer - confirms my understanding. I copied/pasted defendants Statement of Defence and Counterclaim - come up with arguments to counter their case.

2) Helps me re-write job descriptions for my team - simplifying, standardizing the format and giving me new ideas.

3) Helped me model out the IRR of a whole life insurance contract for a buddy who is an insurance broker... to help her explain technical concepts in simple terms.

4) Helped re-write and improve on sections on various legal contracts/agreements

5) Helped write a first-draft on numerous speeches - gives me new ideas, clarify my sentences as I'm not a great writer.

6) Helped me quickly understand the steel beam requirements in my house that I'm planning a reno on

7) Wrote some PHP code for our customer-facing applications (bug fixes, refactoring, too!)

8) did some business analysis - i.e. operating costs/etc and we used this analysis to expand profitably into new markets (geographically...) considering extra transportation costs, etc. (faster than going Excel)

9) sense-check financial models I build in Excel (second set of eyes)

10) Wrote some code in python for personal interest, ran successfully.

11) helped research tax implications of Canadian estate holding US real estate, and tax implications on US and Canadian beneficiaries

12) wrote an obituary. first draft, minor edits.

13) wrote a letter to counter a credit card transaction dispute (all I had to do is input a list of the transactions)

The list goes on... I just encourage people to expand the mind.

nmfisher · on April 1, 2023

I agree. I can't take claims of consciousness/intelligence seriously but that doesn't mean I throw the baby out with the bathwater. GPT4[0][1] isn't perfect, but it's an incredibly useful tool, and I encourage everyone to experiment to figure out what it does well, and how to integrate it into your job/hobby/passion projects in the near future.

[0] I couldn't recommend earlier variants (even GPT-3.5) because they weren't powerful/reliable enough. Way too many simple mistakes to be a net positive. In my experience, GPT-4 is far superior. [1] Applies to all LLMs, not just OpenAI's GPT-4. Everyone else is catching up quickly.

hajile · on April 1, 2023

I’m going to assume you aren’t a legal expert, but you trust this thing to write legally binding contracts that may accidentally use words and phrasing that do something different than you expect?

I seriously doubt GPT understands which laws and case laws apply to which areas. It could be creating a contract that does something you don’t want or even a one that isn’t enforceable.

Pay a lawyer. There’s a lot to gain if you do and a lot to lose if you don’t.

tyrelb · on April 4, 2023

I wouldn't use it to blindly write an ee contract, just make some enhancements in key areas.

From my experience, hiring the right lawyer, in the right jurisdiction, for the right issue/case is important.

Great advice for everyone though - there's a lot to gain from the right lawyer.

jimsimmons · on April 1, 2023

what exactly is your job?

This reads more like an all-star capabilities brochure than someone’s real life/work

tyrelb · on April 4, 2023

Entrepreneur with a number of businesses and projects on the go... majority of my time is in the funeral/death care industry. What I don't outsource, I just learn and do myself. For example, saw an opportunity and had to self-teach programming to build an end-to-end death care software platform that handles 1000's of funeral arrangements each year.

choeger · on March 31, 2023

I tend to agree with the "bullshitting" part. But that gives us a very useful tool to estimate future applications of LLMS.

Will a LLM ever program on its own? Certainly not. It gets semantics wrong so often, it's not even funny. Syntax on the other hand is surprisingly correct. So it could help with refactorings (but be aware of subtle conflicts). Of course, code search will be a killer application.

In other cases it helps to ask "will someone really notice if the LLM puts out bullshit?" What will happen if your manager and your client both believe what the model generates? Will people die? Or will the negative effects be untraceable? In the latter case, why object?

ryanwaggoner · on March 31, 2023

I think it's pretty far from certain that an LLM won't ever "program on its own". I don't have much confidence that we won't have LLMs widely programming within 12 months. As soon as you give it access to a lot more context (your whole codebase), a precise prompt, access to edit and run code, access to a debugger, and then put the whole thing in a loop...you're pretty close at that point. Maybe it just won't be able to cross the gap, but I don't find it hard at all to believe that we could soon have AI agents taking in issue tickets and opening PRs to address them. And if the issue tickets are too vague, they'll ask for clarification. That's an insane thing if you think about it. Six months ago I thought this was a decade or two away, if we could do it at all.

choeger · on March 31, 2023

Impossible. In order to fix a bug, you need to do abstract interpretation or something equivalent. Language models simply can't do that. And even the best bug report will not tell you what you need to do to fix it.

Of course, there are other change requests: Change a color here, store this information alongside that record, modify that output, etc. But even those changes can very quickly spiral out of control. Color models, database schema compatibility, pretty printing ...

No, I think the idea that LLMs take over programmers jobs is born out of a complete misunderstanding of both. Of course that doesn't mean that no one will try to do it. I bet people will even report success in doing it, simply because their bonus depends on it, but I don't think it will ever really work.

What might happen though is that LLMs replace many of the programs we write nowadays. After all, who needs a web browser when a LLM can give you a summary in natural language?

ryanwaggoner · on April 1, 2023

It’s pretty funny that you think this is impossible based on some theoretical idea of what LLMs are. Playing with LLMs today quickly demonstrates how things that shouldn’t be possible in theory are trivial in practice. In a few years, who knows where we’ll be.

choeger · on April 1, 2023

You have obviously no clue what "theory" means in that context. Playing with LLMs has so far demonstrated very little. Yes, these language models are subjectively impressive, but that's exactly what they have been designed to be. Usefulness is a completely different metric and it will be much harder to demonstrate convincingly.

From a pure programming point of view, the situation is simple: Even if a LLM could translate human language into working code, it just becomes another compiler and the input just another, admittedly novel, kind of programming language.

ryanwaggoner · on April 1, 2023

Your hyperbolic dismissal in the face of the existing usefulness of these models is kinda sad, if understandable. I mean, I get it, it’s kinda terrifying, and that can easily manifest as the kind of overly confident dismissal that you’re demonstrating here. But it just doesn’t match reality. I’ve been building software for 25 years, across multiple platforms and every kind of organization under the sun, and I’m using GPT4 to extreme usefulness right now, as are many others. And we’re just scratching the surface.

But by all means, continue to emphatically insist that LLMs have not demonstrated usefulness, or that they’ll never be able to “program on their own”.

tzekid · on April 1, 2023

ChatGPT plugins are in alpha as far as I know. 12 months seems like a reasonable timeframe.

I can totally see a set of plugins that write, debug and test code — all steps a human must now do — all automatically in the next 12 months. That, plus the GPT4-25k token model and you've got something that can replace high-level, high-boilerplate jobs or at least allow one person to 10x their output.

hajile · on April 1, 2023

If it can reason code from the garbage programmers get handed, it’s AGI rather than LLM.

m348e912 · on March 31, 2023

I started reading completely ready to dismiss the author. But I read the article and the guy provides researched and reasoned opinions, understands LLM, and isn't just spewing an uninformed opinion.

I am going to address his point from another perspective:

1. A revolutionary bullshit parrot is still revolutionary and exceptionally useful.

2. There is a high probability the bullshit parrot will get less bullshitty as time goes on and new versions of GPT are released.

3. Most humans are bullshit parrots, and not very good ones at that. Even GPT in its current state can augment the not very good (and very good) bullshit parrots in the labor force, academia, and creative fields.

mdp2021 · on March 31, 2023

> will get less bullshitty

What we seek is "reliable".

> Most humans are

What we sought is to shrink them in number and power.

dudeinhawaii · on March 31, 2023

OK so which is it, do we need to pause all research because it's the end of the world or is it a parrot? I'm getting exhausted with the hyperbole going back and forth.

ineedasername · on March 31, 2023

The pendulum is swinging too far back the other way. At first it seemed amazing but now that further exploration has revealed flaws we're seeing things like this that dismiss a significant portion of its capabilities.

I gave ChatGPT a small CSV dataset and provided information about what the dataset was. I then asked ChatGPT to analyze it in various ways. When I asked for possible explanations for why a specific data point dropped significantly in one year and then partially rebounded in the following year, it gave me 5 bullet point potential explanations that were reasonable first guesses for issues to look into, even though it knew nothing except the industry sector.

The whole series of responses would make a very reasonable jumping off point for exploring the issue and more detailed underlying data.

Then I did the same thing using GPT-4 and the results were significantly better, the explanations more insightful and nuanced.

I'll definitely continue using it to dump some pre-digested aggregates into a prompt and let GPT spitball ideas as a first-pass to get me thinking about different aspects of the problems I work on.

pawelduda · on March 31, 2023

It is a parrot but on the other hand it "read" a lot more compared to an average human. When someone uses fancy words or sentences, they still can apply them to a situation, even without understanding what they're saying. By doing so, such person may accomplish more or fail if someone calls their bullshit. So ChatGPT will be both useful and useless - just like this conclusion :)

andrewstuart · on March 31, 2023

Whenever something amazing comes along there’s always people who want to devalue it for some reason. Maybe it makes the feel smart? Hard to know.

ttul · on March 31, 2023

GPT may exhibit intelligence, but it has no statefulness, other than what little state is stored to maintain each individual conversation in ChatGPT. The model does not learn in real time from ongoing conversations (so far as we know), so it cannot, for instance, say anything about the chat it is having with your neighbor down the street. It literally can only generate from training that happened some time ago.

I think this statelessness is an important consideration in understanding whether GPT is intelligent in anything close to the way we understand intelligence. For instance, it cannot coordinate its own efforts across time.

There is some chance that it may intuit how to store its state externally and coordinate that state in other conversations it is having, but that seems very farfetched. I don’t expect SkyNet just yet.

But as for utility, it has that in spades!

skybrian · on March 31, 2023

You can call it bullshit and in a way it is, but dismissing it in this way means you miss out on what it's good for.

Creativity often involves generating ideas in playful ways, using techniques that aren't really justified and can generate ideas that often don't work. Many of the ideas are wrong. But if you test one of them and it works, it becomes meaningful.

Similarly, chatbots can be used to generate ideas but you need to do the testing.

It works best if testing is inherently part of the process. You can ask a chatbot to recommend a paper to read, and look for it using Google. If it doesn't exist, you find out right away.

Or you ask it to generate some code. You read it over and it makes sense, and you try it and it works.

(It probably shouldn't be your only source of ideas and there's a question of how much testing you need to do.)

pmoriarty · on March 31, 2023

I'd like to meet a parrot that can host a D&D game[1] as well as GPT4 just did.

[1] - https://news.ycombinator.com/item?id=35379136

thom · on March 31, 2023

Who would have thought that the Turing test would devolve into two diametrically opposed camps writing extremely confident essays about how wrong the other side is to judge an AI as promising or not.

amelius · on March 31, 2023

I'm curious what strategies researchers are thinking of for separating the fallacious "bullshit" from the real smart, logical thinking by these models.

sys32768 · on March 31, 2023

In a few years my AI girlfriend will gift me amazing art and songs she created for me based on all my journal entries from the past 30 years.

I will send her love notes and poems that she will always remember.

And for a few extra bucks, I will insert her into The Elder Scrolls as my beautiful thane who knows me better than anyone in that strange land.

lofaszvanitt · on March 31, 2023

Flamethrower time!

squalo · on March 31, 2023

Its nearly impossible to get reliable information using search on the internet. I've wasted hours trying to find facts in Google search results. Anything that relies on accurate data from the innertubes is already going down the wrong path.

notShabu · on March 31, 2023

The human mind is likely a bullshit parrot as well. Those who are at the cutting edge of ideas can claim a bit of non-parrot-ness for a while but most of the time it's just parroting within a curated "elite" subgroup.

What makes a "human" different could be that we live in the physical world with environmental and social consistency. Each time we sleep the mind "dies" and is "reborn", like GPT-4 starting a new conversation with new context.

The physical environment and social interactions (and cues from long term and short term memory) make up the preamble for the day.

If the environment and interactions fall into easily compressible patterns, it can feel like the movie Groundhog Day, a constant loop. However changing relationships (breakup, dating) or changing environment (traveling, moving) changes the preamble where "I" feel like someone different. A harder reset vs the soft reset of sleep.

alienicecream · on April 1, 2023

So what happened here, did you parrot that, or were you doing something else?

gfodor · on March 31, 2023

Immediately checked the date - if you published this in March 2023 it's hilarious because the author can refute many of the claims here by simply paying $20 and adjusting a dropdown to "GPT-4"

AussieWog93 · on March 31, 2023

What an obviously wrong and inflammatory title.

I wonder if the reason this article got so much engagement is because that title was basically a pre-built strawman that anyone could rip to shreds.

Giorgi · on March 31, 2023

So... which stage of grief are we on now?

precompute · on March 31, 2023

Too soon!

anonylizard · on March 31, 2023

Anytime you see the 'parrot' argument, you can instantly go 'opinion dismissed'. Anytime you seem some fancy argument, that bases its tests on chatgpt, instead of the way stronger GPT-4 that usually invalidates half their criticisms, opinion dismissed.

There's no need to debate about embeddings, compression = meaning etc. Simply put, many humans are parrots, much white collar work is just parroting. It doesn't matter what the underlying model of GPT-4 is like, if its useful, it will have a massive impact on the world.

ravenstine · on March 31, 2023

I think too many people have a grossly inaccurate impression of what it means to be intelligent.

Firstly, we still don't truly have many good explanations for intelligence and consciousness that aren't self referential. Drawing hard conclusions on this subject about generative AI and human intelligence, even in 2023, is being hasty, in my opinion.

Which brings me to a lot of the criticisms made about LLMs. The idea that an ideal AI could or even should be capable of constantly generating original output is just plain weird. Where do people get this idea? How often do you meet a human being who hasn't gotten most of their ideas from somewhere else? Even synthesized ideas are forged by other ideas. Ideas that seem original may appear just that, and we just don't see where the connection lies between it and its predecessors. It's silly to dismiss an AI because it "parrots" things, as if that's not a pretty remarkable achievement. It's also very reductive; anyone who has used ChatGPT and doesn't see that it can pretty reliably munge together different ideas is blind or is lying about their use of ChatGPT.

Expecting perfect accuracy and knowledge is also not a reasonable expectation of an AI. An intelligence localized in spacetime is necessarily going to lack understanding of many things. To look to something like ChatGPT as a source of authoritative answers, and to dismiss it upon finding its answers to be specious, is to expect the AI to be God. Well I'm sorry, but we are nowhere near speaking directly to God.

In regards to "true understanding", for all we know, language processing is the workhorse of what humans consider to be acts of intelligence, and individual humans aren't as intelligent as they believe themselves to be. Our ability to communicate with each other about a variety of topics in real time is not so much a matter about being hyperaware of every given fact that is relevant and weighing them in a Bayesian model; how well we can converse is primarily dictated by how rapidly we can construct language by parsing the input, making connections, and producing output. I know I'm not the only one who can speak on a subject only to realize, after having completed a sentence, that I hardly even thought about what I was saying.

mdp2021 · on March 31, 2023

> How often do you meet a human being who hasn't gotten most of their ideas from somewhere else

Again (and again, and again), this argument is the mysteriously recurrent "many people fail", implying that failing would be normal, instead of undesirable.

> "parrots" things, as if that's not a pretty remarkable achievement

It is also the mark of unintelligence. As such, it is a warning alarm.

> pretty reliably

From a calculator you do not expect "«pretty»" reliable, but just reliable. From a human you expect faulty computation because of the economization of resources, but you know it can reason.

> Expecting perfect accuracy and knowledge is also not a reasonable expectation of an AI

The expectation is that of a reliable process bringing to the output.

ravenstine · on March 31, 2023

> Again (and again, and again), this argument is the mysteriously recurrent "many people fail", implying that failing would be normal, instead of undesirable.

That doesn't really answer the question. Why is the immediate expectation, when AI is in its infancy, that it must outperform humans in the ability to be original and always factually accurate? That's not a reasonable expectation when there are not only no examples that it's even possible, but we ourselves aren't original. There's a reason why both computers and human beings have supervisors. What about AI suggests an intelligence that can or should be free of supervision?

> It is also the mark of unintelligence. As such, it is a warning alarm.

That depends on a matter of degree. You and I parrot things all the time. We neither have the time or capability to not take information for granted and pass it along without fully understanding it. What we do is best effort, which is far more achievable than "perfect" computation.

For instance, I could ask you to paint us a painting of literally anything, and chances are your painting will be based on your experience of what a "painting" is and what you paint will probably be fairly unremarkable on a grand scale, no matter how much you try. I don't mean this as an insult, because it applies to virtually everyone and it is what it is. Of course there is the possibility that you will prove me wrong and paint something that no one has painted before using a style or method that is completely new, but I won't bet on it.

In such a case, you'd be "parroting" what artists have been doing since time immemorial. Does that actually devalue your art? Or the art of practically every other artist who mostly painted things based on previous ideas? I don't think so, but that's just me.

> From a calculator you do not expect "«pretty»" reliable, but just reliable. From a human you expect faulty computation because of the economization of resources, but you know it can reason.

Yeah... because calculators don't reason anything.

> The expectation is that of a reliable process bringing to the output.

What is your best example of how the current generation of LLMs aren't reliable enough for a wide variety of tasks?

jiggawatts · on March 31, 2023

Or to put it another way: new styles of art were historically invented every century or so, and a bit faster these days, maybe every decade. All of the other art made by thousands of artists were derivative in some sense. It’s a rare master that can come up with something entirely new.

You see similar things in areas like science. When was the last time you’ve seen a paradigm shift akin to relativity or quantum mechanics? Not for over a century!

mdp2021 · on March 31, 2023

> when AI is in its infancy

Well, having started in 1955, we really must frame that "infancy" in a scale like dog-years, human-years, AI-years...

> that it must outperform humans in the ability to be ... accurate

No, the expectation of some of us was not about «outperform[ing]», but about not losing the focus of «accura[cy]».

> a matter of degree

It does not seem a «matter of degree» but its opposite, a "matter of quality": does the machine question its statements?

> You and I

Actually, the process is that of building on already vetted ideas, and continuously vetting. True, there is an economy of effort, but there is also a capitalization on past effort. // It is unfair to say that you "«parrot»", just because of early stopping, assuming you are trained to vet your output.

> new ... what artists have been doing ... previous ideas

You should revise your perspective there: we learn from the (bigger and smaller) "giants" in the past, and learning is very different from plain imitation. What is learnt is internally elaborated, understood, integrated. There is a whole world of difference from "I learnt from A" and "Z copied A".

> calculators don't reason

But calculators bring a reliable result, and that is what is expected from them. You expect the same from intelligent entities.

> What is your best example

Decision Support Systems. (Edit: I meant as a support Analyst; as an information retriever it could instead be a dramatic asset, if employed as a hound to retrieve sources.)

_gabe_ · on March 31, 2023

> You should revise your perspective there: we learn from the (bigger and smaller) "giants" in the past, and learning is very different from plain imitation

Ah, I see. So when an AI imitates the things it was trained off of it's simply parroting the material. But when we imitate things we've trained off of, we're synthesizing new information that we've learned from. /s

This ambiguity is exactly what OP sounds like they're talking about in regards to intelligence. Who's to say what parroting is and what synthesizing new information from old ideas is? Where is the line? Can we score how "parroty" an idea is? Can we quantify how much intelligence is imbued in a written statement? There is no firm science (that I know of) that quantifies intelligence, but it sounds like we'll be needing to determine hard metrics to find out if AI is truly intelligent or not. Otherwise, we'll have these circular arguments forever that basically dismiss an AI as being intelligent simply because it's an AI.

mdp2021 · on March 31, 2023

> when we imitate

Also in humans imitation as copying and imitation as intelligent inspiration are discriminated.

bulbosaur123 · on March 31, 2023

> From a calculator you do not expect "«pretty»" reliable, but just reliable. From a human you expect faulty computation because of the economization of resources, but you know it can reason.

Just have it use Wolfram Alpha plugin. Not an issue.

galaxytachyon · on March 31, 2023

Your arguments are all over the place and frankly I don't understand what is the problem you are trying to say. Something can be both undesirable and normal. In fact, I would even say that most of the time, the most normal outcome is a failure and we have to put in extra effort to make it a success.

I have seen plenty of bright and intelligence students who most of the time are just parrots of what were told in class or from some other sources. It is actually highly unusual for a high school student or even a freshman or sophomore in college to have an absolutely original and correct idea. That doesn't mean I look at my students as unintelligent, even the below-average ones.

And here comes the crux of the issue. You automatically assign the ability to "reason" to a human even when we make mistakes aplenty. But when a machine does it, you said that is a proof of unintelligence. This is what the OP was criticizing in their post. Most deniers are using arbitrary requirements applied only on the AI and not humans.

After teaching quite a few college level kids, I dare to say that the AI models right now exhibit reasoning ability close to many human. Doesn't mean they are sentient or human-level yet, but they are definitely much much better than what I would ever expect out of dumb silicon and steel.

mdp2021 · on March 31, 2023

> You automatically assign the ability to "reason" to a human even when we make mistakes aplenty. But when a machine does it, you said that is a proof of unintelligence ... requirements applied only on the AI and not humans

Whereas I apply them requirements also to humans? :)

I must correct you, to help understanding: I do assign in general (I hope, to all individuals but for an n-th standard deviation) «the ability to "reason" to a human»; that «we make mistakes aplenty» is overriding (or under-using) a faculty which was present but, in those cases, was not exercised.

You write «when a machine does it»: but can the machine do it? Can it criticize its ontological contents, its built concepts? Humans can, and pretty much must - deontically. The machine should.

> we have to put in extra effort

Exactly. We can, many don't, and the machine should - just like people in general should.

> the problem you are trying to say

While people do not always exercise critical thinking, they should. Verifying, trying your ideas and concepts, making them solid, should be a constant effort. Humans have those modules, which makes them apt to possibly be employed as "predicting machines" (which is a definition of intelligence), or "things you can ask things".

When you build a machine that produces outputs, they may be "for fun", so every generator goes that satisfies the "fun" based "performance indicator", but more normally you will want the output to be reliable. So, the "verification" features are crucial.

> Something can be both undesirable and normal

In engineering, I hope not :)

galaxytachyon · on March 31, 2023

If I understood you correctly, you are asserting that the LLMs entirely lack the "self-awareness" or "critical thinking" part while human simply doesn't use it enough but we are all born with it and always have that ability as a built-in feature, inherent to our beings?

If so, then you are right on the machine's part. The current LLMs can't change its own weights thus in a sense, incapable of questioning its own existence. It can't perform a highly philosophical debate with its own "self" and reform its mind as a result of that action.

What I am still arguing is many human can't do that either. Such thing, in my experience, is actually not that common in the general populace. And while you can say it is there, just "unexercised", that would be functionally identical to it not existing and you can't absolutely prove that "it is still there" in all human.

>In engineering, I hope not

Perhaps in your particular type of engineering. In civil engineering, material degradation is undesirable and normal. In chemical engineering, side reactions are undesirable and normal. In biomedical engineering, well there are too many to list, from undesirable epigenetic changes to patients' unhealthy lifestyles. Things can definitely be undesirable but normal.

mewpmewp2 · on March 31, 2023

These are just some subjective, odd philosophical arguments.

Question is - does this tool provide massive value and productivity boost? Yes.

mdp2021 · on March 31, 2023

> does this tool provide massive value and productivity boost

And the «philosophical arguments» there highlight the other side, about the limitations. Are they relevant? Surely, if they are not evident to the user.

pjc50 · on March 31, 2023

This, to me, is the more terrifying prospect of the AI revolution: people arguing that humans aren't sentient either.

nomel · on March 31, 2023

I think a more likely scenario is that people will romanticize the concept of intelligence so much that they continuously move the definition so that AI is excluded, excluding many humans in the process, until a fully romantic definition is embraced.

I think another possibility is that someone finds that "intelligence" is closer to a fundamental law of information theory, being some sort of emergent result of organizing/organized information, and that we're incredibly special for being capable, but not alone.

JumpCrisscross · on March 31, 2023

> people will romanticize the concept of intelligence so much that they continuously move the definition

We already do this. Braindead patients are not sapient, so we don’t treat them as such. In one sense, the entire abortion debate is about drawing this line. (Hell, childrens’ rights relative to adults are based on agency, which is a legal term that looks remarkably like sapience.)

marshmellman · on March 31, 2023

My hunch is that we’ll eventually realize that we never cared about intelligence in the first place, just “human-ness”.

After all, there’s very low moral concern for cows, pigs, dolphins, etc regardless of intelligence.

ben_w · on March 31, 2023

I've been saying for a while now that the definitions given for sentience are always so poor they're either (0) circular, (1) accidentally include VCRs, or (2) require things that exclude humans.

I think this is only the second time I've encountered someone deliberately trying for the third option, rather than accidentally.

As before, it's as much a surprise to me as I read into the descriptions of those with aphantasia when they realise for the first time that other people have the ability to create mental imagery.

onemoresoop · on March 31, 2023

>I've been saying for a while now that the definitions given for sentience are always so poor they're either (0) circular, (1) accidentally include VCRs, or (2) require things that exclude humans.

Video Cassette Recorders?

ben_w · on March 31, 2023

Yes.

VCRs have an internal structure that records events, changing them a little in the process; these memories can be replayed, and can show very convincing displays of emotional affect…

That's the trouble with definitions given as if the question is "common sense", they let stuff like that through.

Common sense is fine for normal daily life, not so useful when you want to know if the magic silicon crystal with transistors packed more densely than angels on pinheads placed by a flash of light too blue to see and made to perform arcane processes that literally no single human mind can fully comprehend… is merely an impressive but mindless automaton, or an accidentally sentient slave with all the negative connotations therein.

eigenspace · on March 31, 2023

People seem to be really confused between sentience and intelligence, when we don't actually have much reason to think those two are very related.

One can certainly imagine that a very very dumb creature is sentient, and imagine that a very very very intelligent mechanical process is not sentient. We truly do not know what consciousness is or how it arises.

We know that we have consciousness (or at least I know that I have consciousness) and we know that we have some degree of intelligence, but it's not at all clear that one comes from the other.

lottin · on March 31, 2023

I have a strong reason to think that consciousness and intelligence are related. Intelligence requires awareness. Consciousness is awareness of oneself. I for one cannot imagine an intelligent being that does not possess consciousness.

JieJie · on March 31, 2023

Stochastic Parrot == Philosophical Zombie

Same concept, same difference.

kalkin · on March 31, 2023

Not really. The point of the philosophical zombie idea is that they'd be indistinguishable in behavior from conscious people. Part of the point of the stochastic parrots is that supposedly they can't accomplish the same things as genuine intelligence. That's testable.

JieJie · on April 1, 2023

Fair point. I'm not sure if LLMs like GPT-4 haven't passed that test, but you're right about that particular distinction.

I think the larger point, that we don't have very reliable tests for these things, as we're finding out with the testing results that GPT-4 has been showing, is still valid; leading us to have difficulty telling whether GPT-4 is actually intelligent or if it is coming to its intelligent through other means. I think in that way the distinctions are the same.

nemo44x · on March 31, 2023

Or the reductive reasoning that we're just a meat computer, etc. Or confidently explaining that the mysteries of consciousness is just "emergent" in the various machinery that constructs our meat computer. Just this reduction and complete oversight of what it means to be a human and how there's so much more than completing words - that there's memories, feelings, ambitions, dreams, and that we know so little about ourselves and the universe and just where we fit into it, if at all.

tablespoon · on March 31, 2023

> This, to me, is the more terrifying prospect of the AI revolution: people arguing that humans aren't sentient either.

If we're going let the AI revolution fulfill its potential, it's a cognitive shift we're going to have to embrace. After all, AI is going to render a lot of meat-based equipment obsolete and unnecessary, and someone's going to have to figure out how to dispose of all that wasteful junk.

sho_hn · on March 31, 2023

Thankfully, this existential dread is already a subset of the terror that comes from eliminating free will, which makes it not matter in the slightest, which in turn is a subset of superdeterminism! :-)

We'll have to figure out how to go on regardless.

JohnFen · on March 31, 2023

> We'll have to figure out how to go on regardless.

Why?

It's a serious question. What would be the point of going on? Why would we have to figure out how to do it?

sho_hn · on March 31, 2023

> What would be the point of going on?

For me personally it's that the future is unknowable, at least with our current understanding of physics. So yes, everything may be pre-determined, including what will happen to me, but I can't predict it with any confidence. I will still get to experience it and I get to have qualia about it. I get to participate. The little pouch of structure in the universe that is tagged "me" is along for the ride, and also a factor in it all.

I've also known for many, many years that one day I will be dead. There's plenty of existential dread in death, yet I continue my existence building up to it. Even if I had free will: Still dead. Offspring and future generations of Earth? Well, ultimate heat death of the universe. There's many ways to call it all pointless. The only thing that we seem to have going for us is experience while it lasts. Maybe we still discover an escape hatch along the way.

mewpmewp2 · on March 31, 2023

> I've also known for many, many years that one day I will be dead. There's plenty of existential dread in death, yet I continue my existence building up to it.

How old are you? Because maybe we can reach immortality, but probably only if you are positioned properly for that. Anyway, for me the main drive is also curiosity on what is going to happen in the future. It's like this crazy story I want to know how it ends. I don't want to die, because I want to know what happens. Dying would be like going to sleep on a cliffhanger. I'll just prefer to binge rather than sleep.

sho_hn · on March 31, 2023

37, not sure which part of the probability spectrum on immortality by technical means that puts me in :-)

JohnFen · on March 31, 2023

> I will still get to experience it

Perhaps, but then none of it has meaning. Personally, I'm not really interested in having meaningless experiences.

> There's plenty of existential dread in death, yet I continue my existence building up to it.

But that's an entirely different thing. I am of the opinion that it's death that helps give life meaning. I have no existential dread about it whatsoever.

A life without meaning, however, seems like a pointless waste of time to me.

sho_hn · on March 31, 2023

Hm, why do you think our lives would have no meaning if we don't qualify for sentience? (Or is it the free will part? Sorry, I set us up for confusion with the initial comment.)

What I think is: Even if we're just automatons that lack free will, it doesn't negate our existence or the role we play in the universe. We can still be the universe calling itself beautiful, we can still have qualia, we can still be individually unique structures of matter and state, we can still make a difference in what is and what isn't in the overall structure. And nothing else in the universe will do it fundamentally better, either. At some point we're subject to the rules of the game, i.e. physics itself.

JohnFen · on March 31, 2023

It's the free will part. It seems to me that without free will, nothing anyone does actually matters. Any decisions we make, or any understandings we may develop, can change nothing whatsoever. Which makes it all pointless.

> we can still make a difference in what is and what isn't in the overall structure.

I disagree with this part. We can't possibly make such a difference, because it's all preordained anyway.

sho_hn · on March 31, 2023

> I disagree with this part. We can't possibly make such a difference, because it's all preordained anyway.

What I mean is that what plays out is in the interaction of things, and if you just deleted that assembly of stuff that is me from the equation, it would play out differently. I'm not even suggesting you that you adopt this mindset or anything, I'm just saying that for one reason or another - deterministic ones, I suppose - I end up having feelings about this that add up to "pretty cool".

Everything may be deterministic, but there's still things only I know, only I do, interactions only I have, etc. The grand script that is the universe has organized itself in a way where I get to have feelings about my little stretch and space of it, and I get to wonder what's next because I can't know.

For me, this is enough.

I know, I write this from a position of massive privilege, because I'm currently not, for example, seriously sick, or poor, or live in horrible circumstances (which is not to say I don't have any struggles). I hope this continues to be the case. But this was never under my full control: Thoughts on determinism or not, I could get hit by a car tomorrow, or slip and fall. I still have to live with that.

I get your feelings and dread, and I've been there. But the more I've thought about it, the more I've realized it doesn't make a big difference. And how could it? It's always been like this.

JohnFen · on March 31, 2023

> I get your feelings and dread, and I've been there. But the more I've thought about it, the more I've realized it doesn't make a big difference.

The interesting thing is that I traveled the same path as you here, but in the opposite direction.

For the record, I don't suffer existential dread, as I actually believe we have have free will. I was just talking about the hypothetical of not having free will. Thank you for a great conversation on this!

sho_hn · on March 31, 2023

Likewise!

Out of curiosity, what's your (or your favorite) argument pro free will? I've spent some time on Penrose's take, etc., but I couldn't convince myself. I would still be absolutely interested in a pro argument that makes me flip.

JohnFen · on March 31, 2023

Well, honestly, I have never found a great argument either for or against the existence of free will. You'll hate the basis on which I believe in it because it is so terribly tautological.

I believe I have free will because it appears to me that I have free will.

Because I can't make a solid logical argument either way, I believe the thing that brings the greatest amount of joy to me. I never forget that there's an asterisk on this belief, though, and am ready to abandon it should an argument come along that removes ambiguity.

But I also believe that "belief is the death of intelligence", and so use "believe" as shorthand for "have as a working hypothesis".

sho_hn · on March 31, 2023

I mean, this is actually fine. I also don't know another modus operandi than living life like I have free will. I still sweat decisions. :-)

mewpmewp2 · on March 31, 2023

Why would you want to convince yourself in the first place?

sho_hn · on March 31, 2023

Mostly because it'd be more surprising! At the moment I can't predict I'd be convinced by an argument, but a future in which determinism turns out to be the wrong model of the world would definitely be interesting.

mewpmewp2 · on March 31, 2023

Like if there's some sort of ending to all of this and you discover that there's this thing called "free will" which you were judged by the whole time?

sho_hn · on March 31, 2023

Kind of, but my sentiment is a little more meta. A deterministic universe is easy to reason about. Of course there's a huge amount of emergent complexity we still don't fully understand, but I think in the broad strokes, I get it and that's that.

But a universe with free will? All bets are off. Your comment I'm replying to is just a hint at "thinking through the implications". People are scared of determinism, but a universe built on free will is actually absolutely wild.

So, if you and I agree that the unknown (e.g. the future) is exciting and worth hanging around for, wouldn't that be kind of exciting too?

This is more or less also my take on religion. Nothing has convinced me of superior beings, but if it were to turn out there actually are some? Ho-ly shit.

mewpmewp2 · on March 31, 2023

It's all deterministic, or at least there's no reason for there not to be and there's no valid reason to believe for any reason that it's not deterministic. Ultimately it wouldn't matter anyhow, because everyone is controlled by their desires, feelings and these have developed throughout evolution to make sure that you are likely to survive, so likely you would keep going, unless the environment changes to an extent that evolution didn't prepare you for, of course. Which might very well happen, but it won't have anything to do with sentience, determinism or anything else.

Anyway the point is that just try to enjoy the ride if you can.

mewpmewp2 · on March 31, 2023

Why does it matter if something "matters"? You don't even know what the "matter" is. You act according to your emotions, desires and built in survival drive anyway.

JohnFen · on March 31, 2023

I do indeed. And a sense of meaning is important to help in my quest for survival.

It doesn't matter to me if something is ultimately meaningless (such as the fact that in the end, this universe will cease to exist).

What matters to me is to have meaning in my own life. To feel as if my existence changed something. Hopefully, that "something" I've impacted is other people, and hopefully that impact is a positive one. To have people be better off for having known me.

That, for me, is real meaning.

mewpmewp2 · on March 31, 2023

> What would be the point of going on? Why would we have to figure out how to do it?

I mean, I'm just really curious what is going to happen, and what's the secret behind all that, behind the universe, and I hope to live as long as there's this to discover. That is my natural drive though. I don't really care at all whether I'm sentient, conscious or special in any given way. There's no reason for me to be. I just have this built in curiosity drive. It was not chosen by me, but it is there, and I act to engage the drive, because it makes me feel good and otherwise it makes me feel bad, nothing much more than some sort of algorithm. And we don't need to be more than algorithms.

olalonde · on March 31, 2023

What is terrifying about it? I'm a non-sentient human and have no problem with it.

CatWChainsaw · on March 31, 2023

To me it speaks to an utter contempt for other people, and likely a desire to get rid of them. Naturally, most of the people expressing these views want to survive, they just don't want "NPCs" around.

mewpmewp2 · on March 31, 2023

What is the reason for us to have anything "special" or whatever in the first place? Why should we be anything more than input to output. Why does it matter to anyone?

fasterik · on March 31, 2023

That would be a nonsense argument. Sentience is the ability to have sensory experiences. It's orthogonal to intelligence.

Robotbeat · on March 31, 2023

So mash a camera or microphone into the multi-modal input for GPT-4 and we’re done? Sensory experiences does not seem that high of a bar to clear. (Plenty of non-language models already do this.)

fasterik · on March 31, 2023

No, it's not that simple. Having the experience of sensory stimuli requires consciousness. The stimulus isn't the experience.

Robotbeat · on March 31, 2023

And what is consciousness? Does it have to do with qualia?