Interesting how it seems like there's little correlation between source sample-s...

jeroenhd · on June 12, 2022

The AI seems to work best on high-pitched, female voices. The model seems to have improved in this regard since I last tried this website, but it's still very significantly biased towards female voices it seems.

crooked-v · on June 12, 2022

Much of it depends on refinement work on each specific model. Try the Daria voices, for example, which are easy to get results with that sound like they came straight out of the show.

esjeon · on June 13, 2022

I think it's because the underlying(?) TTS can't really portrait how the narrator speaks, which is very exaggerating and highly varying in tempo. The key idea of the app should be that we can easily transform "voice AND emotional tone" of the underlying voice.