Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Perhaps something has changed since I used it - about 2 months ago - or we were using it incorrectly, but in our experience it was "real-time" transcription, but not truly incremental. (Since you and another person have mentioned it's possible I'm leaning towards us not looking through the documentation carefully enough. We didn't spend too much time trying to figure it out because the time limit was a bigger dealbreaker)

Probably best explained as an example. Consider someone saying this:

"How are you doing? I'm doing well. Do you plan to go to the park with Alice and Bob tomorrow to see the fireworks show?"

When we were using it, Google would give:

1. "How are you doing?"

2. "I'm doing well."

3. "Do you plan to go to the park with Alice and Bob tomorrow to see the fireworks show?"

as three separate messages.

So it's certainly not waiting until the end of the entire streaming to give you a result - it sends you those three messages as you speak. In that sense it is "returning text as it's recognized", because once it recognizes you've finished a sentence it computes the words and gives them back to you. But the issue for us was that we could only get results after full sentences or long pauses.

Amazon, on the other hand, would give for the last sentence something like:

1. "Do you plan"

2. "Do you plan to go"

3. "Do you plan to go to the park"

So for our purposes (showing a chat bubble above a person's ahead), the latter version was much more useful. The reason is that in spoken language sentences often ramble, so we wanted to be able to show some incremental updates to the user as a long sentence was spoken so they wouldn't have to read 30 words at once.



From my memory (which is over a year ago), Google's service would give real-time results, likely meaning incremental word by word results. Did you use the gRPC interface or JSON? I think the streaming service is only available via gRPC.

Disclaimer: My comment reflects my own views, and not those of my employer, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: