Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You refer to a comment I made? It was hypothetical based on whisper.cpp notes regarding 30s max chunk limit, how long that takes, and noting that the latency speedup (x120) corresponded to exactly 120 concurrent 30s chunks vs serially transcribing 1 hour of audio.


Yeah, I was referring to the comment you made, was just curious about them, and wanted to confirm to know if they were just making concurrent calls or actually doing some novel optimization under the hood.

I do not think they were sending concurrent chunks to Open AI because the API wasn't out when they launched. That being said, there is some reduction in their accuracy compared to the original whisper, which I imagine they sacrificed to achieve such performance gains.


Obviously it's just concurrent calls to a model that has a 30s window. x120 performance breakthrough by in voice recognition, exactly a multiple of 1 hr / 30s.

I did not say anything about openAI API calls. Neither did they in their post. The mention openAI whisper "model".

/end




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: