Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's still a research project and not a production system:

"We manually analyze the error modes of our system on the custom 100-sentence test set from Appendix E of [11]. Within the audio generated from those sentences, 0 contained repeated words, 6 contained mispronunciations, 1 contained skipped words, and 23 were subjectively decided to contain unnatural prosody, such as emphasis on the wrong syllables or words, or unnatural pitch. In one case, the longest sentence, end-point prediction failed."



To add: > Also, our system cannot yet generate audio in realtime.

For an production GCP API, I think faster than real-time would be necessary.

For example, WaveNet took a year to go from research to production in Google Assistant: https://deepmind.com/blog/wavenet-launches-google-assistant/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: