It's still a research project and not a production system:
"We manually analyze the error modes of our system on the custom 100-sentence test set from Appendix E of [11]. Within the audio generated from those sentences, 0 contained repeated words, 6 contained mispronunciations, 1 contained skipped words, and 23 were subjectively decided to contain unnatural prosody, such as emphasis on the wrong syllables or words, or unnatural pitch. In one case, the longest sentence, end-point prediction failed."
"We manually analyze the error modes of our system on the custom 100-sentence test set from Appendix E of [11]. Within the audio generated from those sentences, 0 contained repeated words, 6 contained mispronunciations, 1 contained skipped words, and 23 were subjectively decided to contain unnatural prosody, such as emphasis on the wrong syllables or words, or unnatural pitch. In one case, the longest sentence, end-point prediction failed."