Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does this analogy work? It's exceedingly hard to make new low-background steels, since those radioactive particles are everywhere. But it's not difficult to make AI-free content - well just don't use AI to write it.


It is, even if not impossible, entirely impracticable to prove any work is AI free. So no one but you can be sure.


> It's exceedingly hard to make new low-background steels

It's not. It's just cheaper to salvage.


Who is going to generate this AI-free content, for what reason, and with what money?


People do. I do, for instance. My blog is self-hosted, entirely human-written, and it is done for the sake of enjoyment. It doesn't cost much to host. An entirely static site generator would actually be free, but I don't mind paying the 55¢/kWh and the $60/month ISP fee to host it.


That only begs the question of how to verify what content is AI-free. Was this comment generated by a human? IIRC, one of the big AI startups (OpenAI?) used HN as a proving ground--a sort of Turning Test platform--for years.


I make all my YouTube videos and for that matter, everything I do AI free. I hate AI.


Once your video is out in the wild there’s as of yet no reliable way to discern whether it was AI-generated or not. All content posted to public forums will have this problem.

Training future models without experiencing signal collapse will thus require either 1) paying for novel content to be generated (they will never do this as they aren’t even licensing the content they are currently training on), 2) using something like mTurk to identify AI content in data sets prior to training (probably won’t scale), or 3) going after private sources of data via automated infiltration of private forums such as Discord servers, WhatsApp groups, and eventually private conversations.


There is the web of trust. If you really trust a person to say that their stuff isn't AI, then that's probably the most reliable way of knowing. For example, I have a few friends and I know their stuff isn't AI edited because they hate it too. Of course, there is no 100% certainty but it's as certain as knowing that they're your friend at least.


But the question is about whether or not AI can continue to be trained on these datasets. How are scrapers going to quantify trust?

E: Never mind, I didn’t read the OP. I had assumed it was to do with identifying sources of uncontaminated content for the purposes of training models.


Clickbait title that’s all.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: