How much audio do you need to build a model?

mahmoudfelfel · on Oct 11, 2022

The original model (https://play.ht/blog/introducing-truly-realistic-text-to-spe...) was trained on 50k hours of audio, the above voices were just finetuned on the model, only 4-6 hours each.

We just finetuned another voice recently with only 1hr though... I think eventually (soon) we will only need 15-20 mins with zeroshot not even finetuning.