Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh this is interesting! Thanks for sharing. To be clear: it’s not that I think it would necessarily degrade the quality of the training, but that the biases of training on other AI output need to be taken into account.


> it’s not that I think it would necessarily degrade the quality of the training

It absolutely would necessarily degrade the quality of the training in the long-term. It's lossy knowledge compression. There is no lossy compression that gets better when you feed its output back into its own input over and over. It's basic information theory.

I admit I don't understand why the linked article had those results -- if the results replicate, it must be somehow squeezing out a bit more usefulness from the training set, similar to the slight perturbations they sometimes give images. Or the model was previously too "unsure" of itself, and it's just amping up its own confidence in itself. If the training set was poisoned, all it would do is squeeze out more poison, or become more confidently wrong. But those are just hunches as to what's going on.


> There is no lossy compression that gets better when you feed its output back into its own input over and over. It's basic information theory

Sure there is. The goal of AI is not to memorize all information, but to make the ability to generalize, so "lossy" doesn't make sense.

For example, noisy data can be improved by successive filtering.

In fact, from an information theory argument, noisy channel coding shows exactly that information can be improved via multiple lossy passes: many modern error correcting codes have iterative decoders, each stage lossy at the level of input to out stages, yet each stage gets closer to the correct original message.

So the "lossy" and "information theory" argument doesn't work.


How was the data created in the first place? A human took their prior knowledge, thought, and wrote something.

There is no conservation law for knowledge. We would expect that when AIs become advanced enough, feeding their own output to them decreases loss, just as it does for humanity. In fact, this is a good definition of intelligence.


"Distillation and amplification" is a somewhat popluar AI technique. For example if you have a chess engine with a heuristic to choose which moves to investigate, you can explore the 20 best paths according to your heuristic, see which moves ended in the best result, and use that to train the heuristic for the first move.

Doing the same thing with LLMs isn't out of the question, but for it to work well you need some kind of reward function that doesn't depend on the model you train. Training on LLM texts that humans conciously chose to publish might already provide that, you just have to somehow filter out the content farms that lack any human review.


Chess is "data mineable" -- you can get more training data just by having computers play chess against themselves. There's clear winners and losers. If you programmed in the rules of chess, a sufficiently powerful AI could learn everything there is to know about the game just by playing itself -- mining its own training data.

There's no analogue with language. The system can't determine whether what was said makes sense or is true on its own. Maybe you could program in "the rules of grammar" and have AIs invent their own languages, but they'd have nothing to say to each other, so don't expect a translation for "a broken clock is right twice a day". Besides, that's not what anyone is doing.

This is why I'm saying any technique like this that works, must work by "squeezing out" more information from the training data (very likely overfitting in the process). You simply cannot data-mine new useful language training data like you can data-mine bitcoin or 1v1 game data.

> for it to work well you need some kind of reward function that doesn't depend on the model you train. Training on LLM texts that humans conciously chose to publish might already provide that

Of course adding more human-curated data can improve the model. But the whole idea of the arxiv article is whether these AIs can improve themselves. It seems patently clear to me that the answer is "only if they're underfit to the data, and only to a limit, after which they will start to overfit on their own excrement". I really just don't see how there's any other possibility that doesn't rely on ChatGPT magically having actually reached the singularity.

Look, even humans don't get perpetually more intelligent just by talking to themselves. After a certain point, all they get is more entrenched in bad ideas, more cult-like, more superstitious. Humans get more intelligent by interacting with the environment and seeing what happens.


AlphaZero plays games of chess against itself over and over, feeding the output of the neural network back into the input, and now it's vastly more powerful than any chess engine that's ever existed.

What's the Kolmogorov complexity of the standard model? If you start with thousands of terabytes of training data, why wouldn't the accurate representation be dramatically smaller than that?

A schoolchild is expected to memorize every word of a text and faithfully repeat it on command. Is that the same thing as understanding a book?


> AlphaZero plays games of chess against itself over and over, feeding the output of the neural network back into the input, and now it's vastly more powerful than any chess engine that's ever existed.

not a good comparison. alphazero's loss function never changed as it was playing itself. it was always just "win this game given these rules". but LLM loss function rewards it for predicting the next token. and now "the next token" might be something that an AI wrote previously.


>alphazero's loss function never changed

Yes, it did.

Alpha zero's loss function changes with every update. The loss function works on the actual outcome versus the predicted outcome, and the predicted outcome is a result of previous learning. So with every step it takes, which are random (and the initial weights are also random), it modifies it's own loss function for future walks in the space of weights. Rerun the entire training with different initial random weights, and look at the sequence of loss functions, and you will get a different sequence.

The paper: https://arxiv.org/pdf/1712.01815.pdf


i mean, there's always an objective ground truth because of the rules of chess never change. did it win or lose?

but the "ground truth" of the english corpus changes all the time. and is changing right now as LLMs emit words into the noosphere. so I don't see how this counters my point.


>the rules of chess never change

Actually, they do under FIDE rules. For example, the 50 move rule has changed many times, even in my lifetime, to different numbers, and there is pressure to change it yet again. They also added a 75 move rule (50 requires player intervention, 75 is automatic).

They recently abolished the automatic draw for insufficient material rule.

They added a new "dead position" rule that forces a draw.

They recently removed a perpetual check draw rule.

They added an automatic fivefold repetition draw rule, to go along with the requiring claim for the threefold repetition rule.

If you don't like FIDE rules, then each national federation has rules that also change.

So claiming the rules never change is simply not true. The rules have changed many, many times, some in pretty big ways (see all the changes in promotion rules since 1800) in the past few hundred years, as well as in even the last decade. Google and read.

>there's always an objective ground truth

That "objective ground truth" is not computable. If it were, chess would be weakly solved (in the game theoretic sense), and it is not, and is expected to never be so. It's too complex. Since no AI can access the "objective truth" of a position, it's no different than what LLMs do - they are measuring next move under some fuzzy AI generated probability distribution over the next move (or token, if you prefer).

>so I don't see how this counters my point

You had a belief, and made a claim to rationalize it, and the claim was false. Usually that should cause to to rethink the belief, not double down on it.

That a game of chess ends with a ternary outcome is irrelevant since AlphaZero is not training on that uncomputable function - it's training on it's own predicted move versus a statistical sampling of the move quality. It never ever knows the "truth" of the outcome of a give position because that cannot be computed - it is far too big.

Your claim:

>but LLM loss function rewards it for predicting the next token. and now "the next token" might be something that an AI wrote previously

is no different than:

"but AlphaZero loss function rewards it for predicting the next move quality. and certainly the next move quality might be something AlphaZero estimated previously".

>is changing right now as LLMs emit words into the noosphere

Chess knowledge is also changing right now as AlphaZero emits new games and even new chess ideas into the noosphere (plenty of GMs have written on how they are rethinking certain positions, and this "knowledge" can be fed into newer engines/AIs as desired....) Not a lot of difference is there?


you are bringing up irrelevant nitpicks and are seemingly intent on misunderstanding/misrepresenting my point. I'm not continuing this discussion.


>intent on misunderstanding/misrepresenting my point

I completely addressed your point - you claim somehow there is a fundamental difference between LLMs and AlphaZero, and you made many claims about why. They were all demonstrably wrong, which is why you misunderstand that there is no fundamental difference, and certainly not the one you claim. Both learn using a fuzzy metric, both can reuse previous things from their own learning, and this is opposite what you claimed.


You're forgetting that not all AI outputs are getting posted. Therefore, training next generation AI on these outputs will reinforce the “share-ability” aspect a bit more.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: