Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LLMs can absolutely create synthesize new knowledge out of existing knowledge. They can't easily do so iteratively because we haven't quite figured out memory yet. Until we figure that out you won't have an LLM discover a new theory of quantum gravity.

And even once we solve that, LLMs - just like human scientists - absolutely need new data from the outside world. Very few breakthroughs were achieved by just thinking about it long and hard, most were the the result of years of experimentation. Something LLMs simply can't do



Recursive self improvement requires the ability to objectively select the "prime" of your own knowledge. From an LLM's perspective a hallucination and a correct answer are the same thing. It does not have any beliefs about what is true or false, because it has no concept of what it's even outputting. It's all just guess the next word. So even if the hallucination completely contradicts countless things it ostensibly "knows" to be true, it is unable to correct itself or realize that what it's outputting is unlikely to be correct.


Pause tokens [1] are directly dependent on the model recognizing whether the answer it arrived at is one it should "commit" to or whether it should abandon it and output a pause token instead.

Similarly if you ask ChatGPT about the current president of Numbitistan it will tell you that it doesn't know about a county with that name, rather than just hallucinating an answer. So it can at least in this circumstance tell the difference between knowing something and not knowing something.

1: https://arxiv.org/abs/2310.02226


The same is true for a human brain in a vat. It's even true for humans historically, it took us millenia to figure out science.

When robots are powered by transformers or the like, I expect we'll see some pretty impressive results.


I don't think this is true - the main difference is internal consistency. Humans adopt a series of views and values, true or false, and tend build up from those. The accuracy doesn't really matter so much as the internal consistency, because it tends to turn out that trying to build from an invalid foundation eventually causes you to stop moving forward, and so the more factually supported values tend to win out over time.

But it's the internal consistency that really matters. LLMs have no internal consistency, because they have no way of 'adopting' a view, value, fact, or whatever else. They will randomly hallucinate things that directly contradict the overwhelming majority of their state, and then do so repeatedly in a single dialogue. If there were a human behaving in such a fashion, we would generally say they had schizophrenia or some other disorder that basically just ruins your ability to think like a human.


Humans are infamously bad at being consistent:

https://en.m.wikipedia.org/wiki/Compartmentalization_(psycho...

The biggest mistake I see people make when criticizing LLMs is that they take the best possible modes of human thought from our best thinkers, and compare that to LLM edge cases.

Accuracy vs consistency isn't really a delineator. There's so much low-hanging fruit atm, like world models for LLMs improving drastically if you just train them longer. I'll believe the naysayers if say in 5 years GPT-4 is still near state of the art. Until then, there doesn't seem to actually be any theoretical limitations.


Hallucination is not an LLM "edge case." It is their normal and only state of operation. It just so happens that 'guess the next word' algorithms are capable of a reasonable frequency of success owing to the fact that a lot of our language is probably mostly redundant, making it possibly 'hallucinate' reasonable statements quite regularly.

Take what I wrote above. If you were given the context of what I have already written, then you could probably fill in most of what I wrote, to a reasonable degree of accuracy, after "It is their normal..." Because the issue is obvious and so my argument largely writes itself. To some degree even this second paragraph does.


IDK, I think it's kinda important for LLMs to get the simple stuff correct in order to justify looking into the rest of the hallucinations.

Like, if you can't tell me "what day is it today?" (actual failed prompt I have seen) then there's no world where I'm going to have a more complicated follow-up conversation with you. It's just not worth my time or yours.


I agree with this, but find it a poor mode of debate. Because it results in a hole-plugging which is then called goal shifting, even though it's not - but rather a lack of precision in the goal to begin with. For example imagine it goes viral that 'wow look LLMs can't even play a half decent game of chess.' So OpenAI or whoever decides to dedicate an immense amount of post-training, hard-coding, and other fun stuff to enabling LLMs to play a decent game of chess.

But has anything changed? Well no, because it's obviously trivially possible for them to play a decent game of chess (or correctly assess the date), but it's an example of a more general issue of LLMs being generally incapable of consistently engaging in simple tasks across arbitrary domains. So you have software that can score some high thing on the LSAT or whatever, but can't competently engage in a game children can play.

The over-specialization for the sake of generating headlines and over-fitting benchmarks is, IMO, not productive. At least not in terms of creating optimal systems. If the goal is to generate money, which I guess it is, then it must be considered productive.


I'm not asking for someone to overfit to being able to properly answer the question of "What day is it today?" I'm giving an example of a simple question that all LLMs need to be able to answer correctly.

But like, people are on here saying that this will make scientific improvements, and until it can get past the basic stuff, it's not in the ballpark of anything more complicated. Right now, we're basically at the stage of 10 million monkeys on 10 million typewriters for 10 million hours. Like, maybe we'll get Shakespeare out of it, but are we willing to sort through all of the crap it will generate along the way, when it can't actually create a useful answer to simple questions?


Why are humans capable of doing that, but it's categorically impossible for LLMs? That's a very high level capability. What's the primitive it is built on that humans have but machines can't have?


Quantum biomechanics. Human brain has built-in indeterministic pattern seeking. Computers are Turing based which 100% dependent on their programming and data inputs. All current LLMs operates on that very deterministic hardware. In order for LLMs to break thru existing glass ceiling, the hardware need to change.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: