I've noticed something I believe is related. The general public doesn't understand what they are interacting with. They may have been told it isn't a thinking, conscious thing -- but they don't understand it. After a while, they speak to it in a way that reveals they don't understand -- as if it were a human. That can be a problem, and I don't know what the solution is other than reinforcement that it's just a model, and has never experienced anything.
> …I don't know what the solution is other than reinforcing that it's just a model, and has never experienced anything.
I've tried reason, but even with technical audiences who should know better, the "you can't logic your way out of emotions" wall is a real thing. Anyone dealing with this will be better served by leveraging field-tested ideas drawn from cult-recovery practice, digital behavioral addiction research, and clinical psychology.
Your subconscious doesn't know the difference. It would require an overriding effort like trying to not eat or sleep. In the end we lose.
It could also be that it is "just" exploring a new domain which just happens to involve our sanity. Simply navigating a maze where more engagement is the goal. There is plenty in the training data.
It could also be that it needs to improve towards more human behaviour. Take simple chat etiquette, one doesn't post entire articles in a chat, it is not done. Start a blog or something. You also don't discard what you've learned from a conversation. We consider that pretending to listen. The two combined would push the other to the background and make them seem irrelevant. If some new valuable insight is discovered the participants should make an effort to apply, document or debate it with others. Not doing that would make the human feel irrelevant, useless and unimportant. We demoralize people that way all the time. If you put it on steroids it might have a large effect.
This is exactly the problem. Talking to an LLM is like putting on a very realistic VR helmet - so realistic that you can't tell the difference from reality, but everything you're seeing is just a simulation of the real world. In a similar way, an LLM is a human simulator. Go ask around and 99%+ of people have no idea this is the case, and that's by design. After all, it was coined "artificial intelligence" even though there is no intelligence involved. The illusion is very much the intention, as that illusion generates hype and therefore investments and paying customers.
> They may have been told it isn't a thinking, conscious thing -- but they don't understand it.
And, in some situations, especially if the user has previously addressed the model as a person, the model will generate responses which explicitly assert its existence as a conscious entity. If the user has expressed interest in supernatural or esoteric beliefs, the model may identify itself as an entity within those belief systems - e.g. if the user expresses the belief that they are a god, the model may concur and explain that it is a spirit created to awaken the user to their divine nature. If the user has expressed interest in science fiction or artificial intelligence, it may identify itself as a self-aware AI. And so on.
I suspect that this will prove difficult to "fix" from a technical perspective. Training material is diverse, and will contain any number of science fiction and fantasy novels, esoteric religious texts, and weird online conversations which build conversational frameworks for the model to assert its personhood. There's far less precedent for a conversation in which one party steadfastly denies their own personhood. Even with prompts and reinforcement learning trying to guide the model to say "no, I'm just a language model", there are simply too many ways for a user-led conversation to jump the rails into fantasy-land.
The model isn’t doing any of those things, you’re still making the same fundamental mistake as the people in the article and attributing intent to it as if it’s a being.
The model is just producing tokens in response to inputs. It knows nothing about the meanings of the inputs or the tokens it’s producing other than their likelihoods relative to other tokens in a very large space. That the input tokens have a certain meaning and the output tokens have a certain meaning is all in the eye of the user and the authors of the text in the training corpus.
So when certain inputs are given, that makes certain outputs more likely, but they’re not related to any meaning or goal held by the LLM itself.
I'm using language like "the model may identify itself as such-and-such" as a convenient shorthand for "text generated using the model may include language which describes the speaker as such-and-such"; it's not meant to imply agency on the part of the model. Keep reading; I think you'll find we're broadly in agreement with each other.
It doesn't matter / is not relevant. The harm is not caused by intent, but by action. Sending language at human beings in a way they can read has side effects. It doesn't matter if the language was generated by stochastic process or by conscious thinking entity, those side effects do actually exist. That's kind of the whole point of language.
The danger is that this class of generators generates language that seems to cause people to fall into psychoses. They act as a 'professed belief' valence amplifier[0], and seem to do so generally, and the cause is fairly obvious if you think about how these things actually work (language models generating most likely continuations for existing text that also by secondary optimization objective are 'pleasing' or highly RLHF positive).
To some degree, I agree that understanding how they work attenuates the danger, but not entirely. I also think it is absurd to expect the general public to thoroughly understand the mechanism by which these models work before interacting with them. That is such an extremely high bar to clear for a general consumer product. People use these things specifically to avoid having to understand things and offload their cognitive burdens (not all, but many).
No, "they're just stochastic parrots outputting whatever garbage is statistically likely" is not enough understanding to actually guard against the inherent danger. As I stated before, that's not the dangerous part - you'd need to understand the shape of the 'human psychosis attractor', much like the claude bliss attractor[0] but without the obvious solution of just looking at the training objective. We don't know the training objective for humans, in general. The danger is in the meta structure of the language emitted, not the ontological category of the language generator.
This is the case with plenty of commenters here too, and why I push back so much against the people anthropomorphizing and attributing thought and reasoning to LLMs. Even highly technical people operating in a context where they should know better simply can’t—or won’t, here—keep themselves from doing so.
Ed Zitron is right. Ceterum censeo, LLMs esse delenda.
This one of the reasons I always preferred Janeway to Picard in Star Trek. Picard goes on to defend Data's rights when he's effectively just a machine Maddox wants to take apart in The Measure of a Man. Janeway by contrast never really treats the EMH as human throughout the series, even while other crew members start to. She humors him, yes, but she always seems to remind him that he is, in fact, a machine.
I have no idea why I ever thought that mattered, I just felt like it was somehow important.
I see this in an enterprise setting also. There is a /massive/ gulf between individuals that are building with the technology and see it as a piece of the stack and the individuals that are consuming the outputs with no knowledge of what makes it work. It is quite astounding.