Hacker News new | past | comments | ask | show | jobs | submit login

How do other LLMs like Claude deal with this?



You don’t talk about the fight club …

Everyone uses „pirated“ content, but some are better at hiding it and/or not talking about it.

There is no other way to do it.


More recently they train on a mix of synthetic and organic text, like the Phi-4 and o1 / o3 models. Original copyrighted text can be safely replaced with synthetic standins.


I think this works only to a certain degree, they will still use as much data as they can use to train the models.

Synthetic data will not replace original data like books. Synthetic data works very good for math.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: