Hacker Newsnew | past | comments | ask | show | jobs | submit | syntex's commentslogin

The Post-LLM World: Fighting Digital Garbage https://archive.org/details/paper_20260127/mode/2up

Mini paper: that future isn’t the AI replacing humans. its about humans drowning in cheap artifacts. New unit of measurement proposed: verification debt. Also introduces: Recursive Garbage → model collapse

a little joke on Prism)


> The Post-LLM World: Fighting Digital Garbage https://archive.org/details/paper_20260127/mode/2up

This appears to just be the output of LLMs itself? It credits GPT-5.2 and Gemini 3 exclusively as authors, has a public domain license (appropriate for AI output) and is only several paragraphs in length.


Which proves its own points! Absolutely genius! The cost asymmetry of producing and checking for garbage truly is becoming a problem in the recent years, with the advent of LLMs and generative AI in general.


Totally agree!

I feel like this means that working in any group where individuals compete against each other results in an AI vs AI content generation competition, where the human is stuck verifying/reviewing.


> Totally agree!

Not a dig on your (very sensible) comment, but now I always do a double take when I see anyone effusively approving of someone else's ideas. AI turned me into a cynical bastard :(


Yes, I did it as a joke inspired by the PRISM release. But unexpectedly, it makes a good point. And the funny part for was that the paper lists only LLMs as authors.

Also, in a world where AI output is abundant, we humans become the scarce resource the "tools" in the system that provide some connectivity to reality (grounding) for LLM


Plot twist: humans become the new Proof of Work consensus mechanism. Instead of GPUs burning electricity to hash blocks, we burn our sanity verifying whether that Medium article was written by a person or a particularly confident LLM.

"Human Verification as a Service": finally, a lucrative career where the job description is literally "read garbage all day and decide if it's authentic garbage or synthetic garbage." LinkedIn influencers will pivot to calling themselves "Organic Intelligence Validators" and charge $500/hr to squint at emails and go "yeah, a human definitely wrote this passive-aggressive Slack message."

The irony writes itself: we built machines to free us from tedious work, and now our job is being the tedious work for the machines. Full circle. Poetic even. Future historians (assuming they're still human and not just Claude with a monocle) will mark this as the moment we achieved peak civilization: where the most valuable human skill became "can confidently say whether another human was involved."

Bullish on verification miners. Bearish on whatever remains of our collective attention span.


Human CAPTCHA exists to figure out whether your clients are human or not, so you can segment them and apply human pricing. Synthetics, of course, fall into different tiers. The cheaper ones.


Bullish on verifiers who accept money to verify fake things


I see that author decorating webiste for Christmas :)


The illussion of reasoning was terrible paper. 2^n-1 how it could fit in context size. I tried o3 and he gave me python script saying that inserting all moves is to much for context window. completely different results.


I think that their point was that the problem is easily solvable by humans without code, and shows the ability to chain steps together to achieve a goal.


Is it easily solvable by humans without code? I suspect if you asked a human to write down all the steps in order to solve a Tower of Hanoi with 12 disks they would also give up before completing it. Writing code that produces the correct output is the only realistic way to solve that purely due to the amount of output required.


Not sure why I am being downvoted. I am simply saying that we know there is a defined algorithm for solving Tower of Hanoi, and the source code for it is widely available. So, o3 producing the code as an answer, demonstrates even less intelligence, as it means it is either memorized or copied from the internet. I don't see how this point counters the paper at all.

I believe what they are trying to show in that paper, is that as the chain of operations approaches a large amount (their proxy for complexity), an LLM will inevitable fail. Humans don't have infinite context either, but they can still solve the Tower Of Hanoi without need to resort to either pen or paper, or coding.


I didn't downvote. T the problem with the paper is that it asks the model to output all moves for, say, 15 disks 2 ^ 15 - 1 = 32767

32767 moves in a single prompt. That's not testing reasoning. That’s testing whether the model can emit a huge structured output without error, under a context window limit.

The authors then treat failure to reproduce this entire sequence as evidence that the model can't reason. But that’s like saying a calculator is broken because its printer jammed halfway through printing all prime numbers under 10000.

For me o3 returning Python code isn’t a failure. It’s a smart shortcut. The failure is in the benchmark design. This benchmark just smells.


> That’s testing whether the model can emit a huge structured output without error, under a context window limit.

Agreed. But to be fair, 1) a relatively simple algorithm can do it, and more importantly 2) a lot of people are trying to build products around doing exactly this (emit large structured output without error).


No worries, I wasn’t saying to you directly.

I agree 15 disks is very difficult for a human, probably on a sheer stamina level; but I managed to do 8 in about 15 minutes by playing around (I.e. no practice). They do state that there is a massive drop in performance at this point.


Remember that with Towers of Hanoi every extra disk doubles the number of moves required. So 15 discs is 128x more moves. If you did eight in 15m then fifteen would take you 32 hours.


The same for me. I only knew how to assign variables, use for loops, if->then, and use poke command. And from this specific point I started thinking about myself as programmer event that the only thing I wrote with C64 basic was a ball moving on the screen. :)


I bought my C64 very late - around 1991/1992. It was in Poland where I bought a used one from my friend. Back then, Eastern Europe was a decade behind the Western side of Europe. Two years later, I purchased a used disk drive. So, for two years, I could only run cartridges like Boulder Dash (I managed to synchronize the tape drive properly only once and played "Winter Games"). But from that boredom, I started programming in BASIC, always dreaming about creating the perfect text based game ;p


>(I managed to synchronize the tape drive properly only once and played "Winter Games")

Odd; the Commodore Datasette is about as reliable as a microcomputer tape storage system can be, far more so than the tin cans-on-a-string designs of Sinclair and TRS-80. Did you attempt to use a regular cassette recorder with a third-party adapter?


I think there were alignment programs for the Datasette. It played a constant tone or signal that would show whether the head was properly aligned. I think it was on on cartridge that I didn't have. And actually as a young kid I didn't know about this alignment thing. Learned years later after switching to Amiga 500.


Hmm, same here. I had a Datassette 1530 C2N but never managed to load anything really. I think once or twice it worked.

My parents even sent it in for repair but it came back as "it's not broken".


Similar to me, but years earlier in the US. The best thing that happened to me at that time was not being able to afford a floppy drive. My friends who had one just played games. I had to learn to program instead.


> The best thing that happened to me at that time was not being able to afford a floppy drive.

Well, you were lucky in more ways than one, since the Commodore 1541 floppy drive is legendary for being both more expensive and slower than other 8-bit floppy hardware. So much so there was quite a market in software and hardware hacks to improve performance (the reasons why it was so bad have been written about extensively (including by its designers) and are a fun read).

> My friends who had one just played games.

Initially I didn't even have a tape cassette recorder and just had to type my programs in again. At least that made only having 4K of memory in my 8-bit micro not a problem :-). I guess it's a good thing you didn't know there were commercial games available on cassette tape or the world might have one less programmer!


Luxury! I had a Vic-20, cassette drive, and a black and white TV. Also learned to program.


cheaper hardware usually means more adoption of the software and then even more demand for hardware


Correct answer, never think about the future in terms of linear extrapolations. It's a non-linear differential equation with lots of variables and expect complex feedback loops. Systems react to change.



you are assuming that cost is stopping from ppl using these technologies.

These things are not actually useful. They hyper optimzed it for coding usecase but it still sucks balls at it.


When the cost of training a model goes down, it doesn't simply become cheaper to the end user. In addition to that, the provider will train even larger and more capable models.


> the provider will train even larger and more capable models.

cost isn't the limiting factor in this though. 'even larger' models arn't 'more capable' . where did you get that from?


I wonder why Poland is still buying F35 and other European countries.


Because it's the only mass-produced (and thus relatively cheap) 5th Gen fighter that gives you a lot of advantages over 4th Gen and it will likely take at least a decade before mass-produced EUropean alternatives are available.

But yeah, actual experts with access to hardware should validate if there is a kill switch and if replacement parts / weapons could be reverse engineered before buying any more.


Poland was already pissed off with US arm industry under Biden with slow deliveries of US weapons and started to order more and more from South Korea. I guess it will only accelerate.


What i can do with that?


Probably nothing.

Inference providers like Fireworks, or major clouds, can use this to reduce their cost, if they don't already have a replication with similar perf.

vLLM and SGLang may integrate this to be faster at serving DeepSeek-V2/V2.5/V3/R1 on H100/H800s.

I believe that's why they didn't release this back then, this is part of their "moat" (pretty weak tho) and it only benefits competitors.

Open sourcing this after being very popular may indicate that they don't want all the users to use their API/Chat and now want the world to serve it instead? Idk.


just 2piR and then extra h change the result very little fraction. How is that counter-intuitive :)


Why does this have so many upvotes? Is this the current state of research nowadays?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: