Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
numpad0
on Oct 25, 2020
|
parent
|
context
|
favorite
| on:
Books in .txt format for AI training purposes
I’ve tried some sci-fi nonsense on GPT-2 in the past and it returned flattened out phpBB replies, complete with usernames and headers. So whatever they are trained with must include web crawls.
minimaxir
on Oct 25, 2020
[–]
GPT-2 was trained on web links linked to from Reddit which would explain your output.
GPT-3 was trained on the Common Crawl + books.
Consider applying for YC's Summer 2025 batch! Applications are open till May 13
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: