this is the first time I can't tell the difference between the synthesized voice...

_9jgl · on Dec 19, 2017

If you listen carefully, it's possible with some of the samples to hear that the human is stressing a different word (e.g. "that girl" vs "that girl", or "too busy for romance" vs "too busy for romance"), but I couldn't tell which was the real recording based on that alone.

yepthatsreality · on Dec 19, 2017

My take is that the human voices have a more emotionally and weightier tone to the voice, while the robot is flat and direct.

justonepost · on Dec 19, 2017

I thought it was pretty easy to pick the audio clips with greater variance in pitch and pronunciation. 2-2-2-1 were the human voices.

modeless · on Dec 19, 2017

Haha, try again, the human is 1,2,2,1 according to the filenames (I was fooled too).

I do think the difference would become obvious with a paragraph or more of speech, though. It's difficult to judge what the correct intonation should be on these single sentences without context. Ultimately, correct intonation requires a complete understanding of meaning which is still out of reach. An audiobook read by tacotron 2 would still sound strange.

justonepost · on Dec 19, 2017

Depends on the audiobook. I think technical docs would be alright, which is what I want this mostly for. Lots of technical docs I'd like to listen while I work out.

nocut12 · on Dec 19, 2017

Looks like the real ones are actually 1-2-2-1. The file names of the samples end in either "gt" or "gen", which kinda gives it away.

I thought the same thing initially -- I guess it fooled a few of us with that first one!

notahacker · on Dec 20, 2017

I thought the first one was the clearest once you've read that the synthesised voice attempts to guess which words should be stressed from syntax: sentences beginning with the word "that" often should stress "that" because they're distinguishing that choice from some other, but probably not for this particular instance where it's an off hand reference to some girl from some video...