> They may have been told it isn't a thinking, conscious thing -- but they don't...

eschaton · 2025-06-29T00:20:19 1751156419

The model isn’t doing any of those things, you’re still making the same fundamental mistake as the people in the article and attributing intent to it as if it’s a being.

The model is just producing tokens in response to inputs. It knows nothing about the meanings of the inputs or the tokens it’s producing other than their likelihoods relative to other tokens in a very large space. That the input tokens have a certain meaning and the output tokens have a certain meaning is all in the eye of the user and the authors of the text in the training corpus.

So when certain inputs are given, that makes certain outputs more likely, but they’re not related to any meaning or goal held by the LLM itself.

duskwuff · 2025-06-29T01:19:06 1751159946

I'm using language like "the model may identify itself as such-and-such" as a convenient shorthand for "text generated using the model may include language which describes the speaker as such-and-such"; it's not meant to imply agency on the part of the model. Keep reading; I think you'll find we're broadly in agreement with each other.

hexaga · 2025-06-29T02:24:39 1751163879

It doesn't matter / is not relevant. The harm is not caused by intent, but by action. Sending language at human beings in a way they can read has side effects. It doesn't matter if the language was generated by stochastic process or by conscious thinking entity, those side effects do actually exist. That's kind of the whole point of language.

The danger is that this class of generators generates language that seems to cause people to fall into psychoses. They act as a 'professed belief' valence amplifier[0], and seem to do so generally, and the cause is fairly obvious if you think about how these things actually work (language models generating most likely continuations for existing text that also by secondary optimization objective are 'pleasing' or highly RLHF positive).

To some degree, I agree that understanding how they work attenuates the danger, but not entirely. I also think it is absurd to expect the general public to thoroughly understand the mechanism by which these models work before interacting with them. That is such an extremely high bar to clear for a general consumer product. People use these things specifically to avoid having to understand things and offload their cognitive burdens (not all, but many).

No, "they're just stochastic parrots outputting whatever garbage is statistically likely" is not enough understanding to actually guard against the inherent danger. As I stated before, that's not the dangerous part - you'd need to understand the shape of the 'human psychosis attractor', much like the claude bliss attractor[0] but without the obvious solution of just looking at the training objective. We don't know the training objective for humans, in general. The danger is in the meta structure of the language emitted, not the ontological category of the language generator.

[0]: https://news.ycombinator.com/item?id=44265093