Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The fork that I've been using, WhisperX, seems to do better. I've used it on clean splits of mic tracks (ie total silence when the other is talking) with far fewer hallucinations.


WhisperX works better because it implements a robust VAD (Voice Activity Detection) preprocessing step that effectively filters out silence segments before they reach the model, preventing the hallucination triggers entirely.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: