It took me a while to understand what you did here. I was waiting for some kind ...

danso · on Feb 25, 2016

Yeah, I was a bit lazy...I could have used moviepy (which I currently use, but merely as a wrapper around ffmpeg) to add subtitles to show which identified word was identified...I'm hoping to make this into a command-line tool for myself to quickly transcribe things...though making supercuts is just a fun way to demonstrate the concepts.

The important takeaway is that the Watson API parses a stream of spoken audio (other services, such as Microsoft's Oxford, works only on 10-second chunks, i.e. optimized for user commands) and tokenizes it...what you get is a timestamp for when each recognized word appears, as well as a confidence level and alternatives if you so specify. Other speech-transcription options don't always provide this...I don't think PocketSphinx does, for example. Or sending your audio to a mTurk based transcription service.

Here's a little more detail about The Wire transcription, along with the JSON that Watson returns, and a simplified CSV version of it:

https://github.com/dannguyen/watson-word-watcher/tree/master...