Hacker News new | past | comments | ask | show | jobs | submit login

How is the latency for real-time TTS? I remember kicking the tires several months back but went with one of the big 3 cloud providers since they had lower latency.

I also like that the cloud provider supports SSML and I can explicitly configure the emotion, whereas Playht dynamically changed the emotion based on context of the text.




The latency is not real-time yet but we're working on getting it to near real time. Regarding controlling the voice, we've added a few params like rate, voice guidance and temperature but for the most part the emotion is dependent on the text for now.


The dream for scammers


Low latency would open up a whole lot of interesting applications. Even Elevenlabs doesn't seem to have low enough latency in my testing to work as a convincing voice assistant or to, for example, work in real time on a phone call. For that we likely need QUIC or some kind of streaming protocol.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: