Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, its an unspoken but rampant thing in the llm community. Basically no one respects licenses for training data.

I'd say the majority of instruct tunes, for instance, use OpenAI output (which is against their TOS).

But its all just research! So who cares! Or at least, that seems to be the mood.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: