Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How much audio do you need to build a model?


The original model (https://play.ht/blog/introducing-truly-realistic-text-to-spe...) was trained on 50k hours of audio, the above voices were just finetuned on the model, only 4-6 hours each.

We just finetuned another voice recently with only 1hr though... I think eventually (soon) we will only need 15-20 mins with zeroshot not even finetuning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: