Haha, the problem is: we still haven't gotten AI to generate voices! "State of the art" just uses regular TTS engines and then adds extra inflection as "special sauce" or stretches it out, etc. At least, that is how it works for these AI generators that need to do it "at scale." When you can spend more on classic speech models, you can go well beyond that (see Siri, Google, Alexa, etc).