Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been wondering lately, it seems like audio books might be an amazing training resource for models like these, if you could get the script that the reader was working from!


Your wish, granted. http://www.openslr.org/12/

It's 1000 hours of audio book readings, segmented by sentence, with transcripts. All from project Gutenberg, so maybe a little bit heavy on Victorian bodice rippers and such, but certainly a great trove of training data...



That data is no good for this purpose, as it’s from a lot of different speakers and does not have speaker labels, i.e., you can’t tell which sentences were spoken by which speaker.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: