How is it not the case? It is unusable without VAD or editing. I don't understan...

How is it not the case? It is unusable without VAD or editing. I don't understand what you're questioning

I agree their products could be better "end to end" integrated. Meanwhile there is a continuously-improving field of work for detecting speech (which Whisper is incapable of). They offer official "cookbooks" with guidance on an approach they recommend: https://cookbook.openai.com/examples/whisper_processing_guid...

> At times, files with long silences at the beginning can cause Whisper to transcribe the audio incorrectly. We'll use Pydub to detect and trim the silence.

(Official OpenAI quote)