We just finetuned another voice recently with only 1hr though... I think eventually (soon) we will only need 15-20 mins with zeroshot not even finetuning.