Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting how it seems like there's little correlation between source sample-size and quality. e.g. the Portal Sentry turret at 1.5min input vs the 100+ minutes of the narrator from Stanly Parable which sounded like auto-tune had a stroke.


The AI seems to work best on high-pitched, female voices. The model seems to have improved in this regard since I last tried this website, but it's still very significantly biased towards female voices it seems.


Much of it depends on refinement work on each specific model. Try the Daria voices, for example, which are easy to get results with that sound like they came straight out of the show.


I think it's because the underlying(?) TTS can't really portrait how the narrator speaks, which is very exaggerating and highly varying in tempo. The key idea of the app should be that we can easily transform "voice AND emotional tone" of the underlying voice.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: