Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If they don't release the model, recreating it doesn't look too hard. $100 worth of compute time to run the fine-tuning, and the training data they used is here: https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpac...

That would have the same licensing problems that they have though: that alpaca_data.json file was created using GPT3. But creating a "clean" training set of 52,000 examples doesn't feel impossible to me for the right group.



You're only bound by the terms of OpenAI's agreement if you agreed to the terms of use. If a third party obtained the data without signing an agreement with OpenAI (eg. by just downloading it from that repo) they are under no obligation to refrain from using it to compete with OpenAI. It is fair-use by the same argument OpenAI itself uses to train its own models on publicly available data.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: