It's still a research project and not a production system: "We manually analyze ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		applejinn on Dec 19, 2017 \| parent \| context \| favorite \| on: Tacotron 2: Generating Human-Like Speech from Text It's still a research project and not a production system: "We manually analyze the error modes of our system on the custom 100-sentence test set from Appendix E of [11]. Within the audio generated from those sentences, 0 contained repeated words, 6 contained mispronunciations, 1 contained skipped words, and 23 were subjectively decided to contain unnatural prosody, such as emphasis on the wrong syllables or words, or unnatural pitch. In one case, the longest sentence, end-point prediction failed."

thesandlord on Dec 19, 2017 [–]

To add: > Also, our system cannot yet generate audio in realtime.

For an production GCP API, I think faster than real-time would be necessary.

For example, WaveNet took a year to go from research to production in Google Assistant: https://deepmind.com/blog/wavenet-launches-google-assistant/

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact