Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That’s a good point. Wouldn’t OpenR1 suffer from the same problem? Or does being open somehow shield them from legal repercussions?


Some people believe they can dodge copyright issues so long as they have enough indirection in their training pipeline.

You take a terabyte of pirated college physics textbooks and train a model that can pose and answer physics 101 problems.

Then a separate, "independent" team uses that model to generate a terabyte of new, synthetic physics 101 problems and solutions, and releases this dataset as "public domain".

Then a third "independent" team uses that synthetic dataset to train a model.

The theory is this forms a sort of legal sieve. Pass the knowledge through a grid with a million fact-sized holes and with enough shaking, the knowledge falls through but the copyright doesn't.


Knowledge laundering




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: