No audio support: The models are currently trained to process and understand vid...

xendo · 2024-12-03T21:45:08 1733262308

They also announced speech to speech and any to any models for early next year. I think you are underestimating the effort required to release 5 competitive models at the same time.

plumeria · 2024-12-04T03:45:41 1733283941

Is Gemini better than Whisper for transcribing?

jmward01 · 2024-12-04T18:42:38 1733337758

'better' is always a loaded term with ASR. Gemini 1.5 flash can transcribe for 0.01/hour of audio and gives strong results. If you want timing and speaker info you need to use the previous version and a -lot- of tweaking of the prompt or else it will hallucinate the timing info. Give it a try. It may be a lot better for your use case.