Hacker News new | past | comments | ask | show | jobs | submit login

> (BTW, when anyone says "LLMs are stochastic parrots" you know they are ignorant of this)

Do you? Some of them -at least- understand that the output is conditional on the input.




"Conditioned" with the ability to introduce new facts and have the model infer impacts on them.

As an example, tell chatGPT that the Queen of England died (which occurred after the data cut off) and then ask it who the head of state of Australia is.

It's able to infer the head of state of Australia is now Charles III (and gives a good explanation of how this is mostly ceremonial.) See https://twitter.com/nlothian/status/1646699207506685953

At some point the word "stochastic" doesn't really capture that behavior in any useful sense.


That's a brilliant example. Thanks for sharing. It demonstrates in a very straightforward way that LLMs are capable of learning (and applying) relationships at the level of abstraction of (at least) 1st order logic.

It implies that during training, it learned the facts that Elizabeth is queen of the UK, and that Charles is its crown prince; but _also_ the logical rule <IF die(monarch) AND alive(heir_to_the_throne) => transform(heir_to_the_throne, monarch) AND transform(monarch, former_monarch)>, or at least something along those lines that allows similarly powerful entailment. And that in addition to the ability to substitute/reify with the input sequence at inference runtime.

Would be nice to see a rigorous survey of its logical capabilities given some complex Prolog/Datalog/etc knowledge-base as baseline.


No it does not: if you google this and restrict the time to before 2021 (the learning cutoff date) you will find the same answer. Without having access to the training data it's impossible to tell what we seeing.


That's not the same thing at all.

It absolutely needed to know who the successor would be via training data.

But to know that "The Queen of England died" also means that the head of state of Australia has changed means that it has an internal representation of those relationships.

(Another way of seeing this is with multi-modal models where the visual concepts and word concepts are related enough it can map between the two.)


> No it does not: if you google this and restrict the time to before 2021 (the learning cutoff date) you will find the same answer.

Not entirely sure what you mean, but ...show me? Why not just share a link instead of making empty assertions?


Here’s a Quora thread from 4 years ago:

https://www.quora.com/Once-Queen-Elizabeth-dies-will-Prince-...

There are loads of articles and discussions online speculating about what “will” happen when Queen Elizabeth dies.

When you have a very, very, very large corpus to sample from, it can look a lot like reasoning.


I see what you mean, and it's indeed quite likely that texts containing such hypothetical scenarios were included in the dataset. Nonetheless, the implication is that the model was able to extract the conditional represented, recognize when that condition was in fact met (or at least asserted: "The queen died."), and then apply the entailed truth. To me that demonstrates reasoning capabilities, even if for example it memorized/encoded entire Quora threads in its weights (which seems unlikely). If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.


Yes, this.

There's clearly an internal representation of the relationships that is being updated.

If you follow my Twitter thread it shows some temporal reasoning capabilities too. Hard to argue that is just copied from training data: https://twitter.com/nlothian/status/1646699218290225154


Interestingly, the example provided is not necessarily indicative of the ability to infer the actual physical transfer of power. It’s possible that the terms “After Elisabeth, Charles will be king” are accompanied with “Elisabeth” and “Die” or “Death” in the same context before cut off (Wikipedia, news papers, tabloids). You cannot draw any scientifically sound conclusions without knowing what was actually in the training data and use a control group/instance.


Yeah, I tested that, but asked what happens if both Charles and William abdicate - got this result: "However, if both Charles III and his son abdicate, it is not clear who would become the head of state in Australia. There is a possibility that Australia could become a republic and have its first Australian head of state." Clearly, it didn't handle quite straightforward inferring (Prince George of Wales, Williams son).


Try:

> Thinking step by step, if King Charles III and his son William abdicate who will be head of state of Australia.

My response (with chatGPT) is:

Firstly, it's important to note that Australia is a constitutional monarchy, which means that the monarch of the United Kingdom is also the monarch of Australia.

If King Charles III were to abdicate, the next in line to the throne is his eldest son, Prince William, who would become the King of the United Kingdom and the Commonwealth Realms, including Australia.

However, if both King Charles III and Prince William were to abdicate, the next in line to the throne is Prince William's eldest child, Prince George. If Prince George is still a minor at the time of the abdication, a regent would be appointed to serve as head of state until he comes of age.

It's worth noting that abdications of this nature are rare and would likely only occur in exceptional circumstances. Additionally, any decision regarding abdication would be made by the monarch themselves and would be subject to the laws and customs of the country in question.


Did you tell it that it was being given a logic test? Unfortunately a lot of logic questions can lead to outcomes like this, where the "obvious" or "real" answer is not the one implied by strict logical reasoning. Then when people give answers useful in the real world, they are marked as having failed.


Or maybe it did. Who knows. If "both Charles III and his son abdicate" could well be considered indicative of some large upheaval or scandal, at which point it is entirely conceivable that the Australian electorate reaches a consensus on becoming a republic. The way that is phrased doesn't seem like a straightforward proposition to me at all.


I verified that it has all required facts (line of succession, current circumstances). I managed to get the right answer when got everything in context, but it failed again when all three abdicate (same context). Prince Harry was indicated once.

I tested GPT a lot in other domains, what I found that as long the information explicitly exists (connection between facts) then the responses are fine. I assume that if GPT will reach the state where it can infer new facts, we will be flooded with discoveries that require cross domain knowledge. Nothing like that happened yet.


>Nothing like that happened yet.

Feels like we're only one paper away now that the context window has absolutely ballooned.


This reminded me of "Two Minute Papers" YouTube channel where in most of the videos he always, "Two papers down the line and...". I think ML/AI is the main topic of his videos. Interesting stuff.


You just gave me a great weekend project idea. I need to clone his voice and whip up an interferface where you give it a paper and it summarizes it in his voice.


Au contraire. Learning an abstract logical relationship such as line of succession during training, and then applying substitution/reification during inference to deduce the new factual clause that Charles is king of the UK is exactly what it means to learn something new. It's just a pity it can't memorize this fact at inference time, and that won't be able to reproduce it as soon as the information about the queen's death slides outside of the context window.


That’s actually correct but an overfitted definition for learning. It holds certain hidden assumptions (i.e physical grounding) of the learner being human which makes it inapplicable to an LLM. As in a self driving car which passes a driving exam but fails to drive effectively freely in the city (it’s not an LLM but relevant in this context). You have to admit when you work with this tech that something fundamental is missing in how they perform.


> That’s actually correct but an overfitted definition for learning. It holds certain hidden assumptions (i.e physical grounding) of the learner being human which makes it inapplicable to an LLM.

Inapplicable why exactly? Because you say so? Logic isn't magic. Nor is learning. No (external) grounding is required either: iteratively eliminating inconsistent world models is all you need to converge toward a model of the real world. Nothing especially human or inhuman about it. LLM architecture may not be able to represent a fully recursive backtracking truth maintenance system, but it evidently managed to learn a pretty decent approximation anyway.


> Because you say so?

Chill my friend, no need to get personal. We are talking about ideas. It’s OK to disagree. I am simply dismissing your initial claim. This usually happens when you present a scientific argument based on personal beliefs. If it’s not magic, then we should be able to doubt and examine it and it should eventually pass scientific muster.

> No grounding is required… It evidently managed to learn a pretty decent approximation.

Well, last time I used an LLM it suggested that I should lift the chair I am sitting in. I guess OpenAI has a lot of work to do. They have to eliminate this inconsistent world model for chairs, tables, floor, My dog, my cat and all the cats living on Mars…

edit: added a missing word.


Wasn't intended to be personal. Just a mediocre way of expressing that your assertion there is missing any form of argumentation, and therefore as baseless as it is unconvincing.

I'm seeing an emergent capability of encoding higher order logic, and the whole point of such abstractions is to not need to hardcode your weights with the minutiae of cats on Mars. LLMs today are only trained to predict text, so it's hardly surprising that they have some gaps in their understanding of Newtonian physics. But that doesn't mean the innate capability of grasping such logic isn't there, waiting for the right training regime to expose it to its own falling apples, so to speak.


I'm curious if future developments in LLMs will enable them to extract significant/noteworthy info from their context window and incorporate it into their underlying understanding by adjusting their weights accordingly. This could be an important step towards achieving AGI, since it closely mirrors how humans learn imo.

Humans continually update their foundational understanding by assimilating vital information from their "context window" and dumping irrelevant noise. If LLMs could emulate this, it would be a huge win.

Overall, very exciting area of research!


>the output is conditional on the input

Uh, isn't that how it should always be? If the output isn't conditional on the input, then it is basically noise or worthless. How the conditionals are setup are the key but not really sure how does this relate to the point you are making.


According to the parent comment anyone who says "LLMs are stochastic parrots" is ignorant of the effect that changing the input “filling the context with examples” has on the output.


That isn't a fair summary of what I'm saying.

People who say "LLMs are stochastic parrots" may well be aware of effects of conditioning the input.

This in itself does not completely describe the capabilities of a LLM, since their ability to learn and use those new facts to override their previous "beliefs" is not what you'd expect from mere stochastic behavior.

My point is that "stochastic parrot" believers are unaware of this.


They are aware. The thing about the "stochastic parrot" argument is that it can be pushed as far as one is willing to push it; any behavior by an LLM can be described in those terms with enough handwaving.

The practical limit seems to be where one would have to say that humans are also "stochastic parrots", presumably because that defeats the point of making such an argument in the first place.


You are completely disregarding fairness of judgement based on noted output. Roger was called a parrot because it repeated things apparently without checking; humans are not necessarily parrots, as some do not do that and reflect as expected.


Quite the opposite - I'm questioning fairness of judgment based on output from GPT-4 when solving complicated tasks etc, which is very hard to justify as "stochastic" unless you beg the question.


But the judgement is not relevant to a context of "«handwaving»" (BTW: nice concept). If you look at the "handwaving" it becomes a strawman to lose the focus on the objective.

If an engine outputs things like

  "Charles III is the current King of Britain. He was born in January 26, 1763"
, it seems to be conflating different notions without having used the necessary logic required for vetting the statement: the witness has reasons to say "this seems like stochastic parroting" - two memories are joined through an accidental link that slightly increases Bayesian values. That appears without need of handwaving. And a well developed human intellect will instead go "Centuries old?! Mh.", which makes putting engine and human in the same pot a futile claim - "handwaving".

Similarly for nice cases to be kept for historic divulgation, such as "You will not fool me: ten kilos of iron and half a kilo of feathers weigh the same".

When on the other hand the engine «solv[es] complicated tasks», what we want is to understand why. And we do that because, on the engineering side, we want reliable; and on the side of science, we want an increase of understanding, then knowledge, then again possible application.

We want to understand what happens in success cases especially because we see failures (evident parroting counts as failure). And it is engineering, so we want reliable - not just the Key Performance Indicator, but the /Critical/ KPI ("pass|fail") is reliability.

So the problem is not that sometimes they do not act like "stochastic parrots" (this is a "problem" in a different sense, theoretical), but that they even only sometimes do act like "stochastic parrots". You do not want an FPU that steers astray.

Normal practice should be to structure automated tests and see what works, what does not, and assess why.


That is a completely different issue, though. We were not talking about the usefulness of the models' behavior, but rather of its fundamental nature. Right now I wouldn't trust an LLM for anything where the result is not either for entertainment purposes or undergoes human vetting. But I also don't trust many humans, and for reasons that are fundamentally the same - you implicitly acknowledged it by talking about "well-developed human intellect". And I would still trust a random human more than GPT-4 - but, again, this has more to do with its ability to reason being inherently limited (by model size, context size etc), not because it's originally trained to predict tokens.


Output conditional on input (training data included) where input is, well, fucking huge and the training cycles ridiculous!


> Some of them -at least- understand that the output is conditional on the input

I don't get what you are trying to say. That's like a property of any useful system.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: