Show HN: Transcribe YouTube Videos

Leftium · on Sept 8, 2024

Perhaps skip the whisper step if the video already has decent manual (not auto-generated) captions, like this video: https://youtu.be/i-BkN3rTK0Q

---

Another use case is being able to quickly jump to a specific spot inside a video. Could you add timestamps with links that jump directly to that point in the video?

I accomplished something similar by modifying oTranscribe:

- https://otranscribe.netlify.app/?vsl=definedefine

- https://otranscribe.netlify.app/?vsl=letter

---

Finally, I'm a windows user so a whisper.cpp version would be nice~

llimllib · on Sept 8, 2024

good idea! I've no idea how to distinguish between auto-generated and manual captions, but I definitely should take them if available.

A timestamp flag is also a good idea.

Will noodle on a whisper.cpp version!

A question: on mac I can `brew install whisper-cpp`. Is there any equivalent way to install it on a windows machine? I haven't used windows in a very long time.

Leftium · on Sept 8, 2024

On Windows, I use scoop.sh: https://scoop.sh/#/apps?q=whisper

I was able to do this:

    scoop install main/whisper-cpp
    
    mkdir models
    
    ## Download model file ggml-base.en.bin to models directory above
    
    yt-dlp.exe -x --audio-format wav --audio-quality 16K -o "out.wav" ZMklf0vUl18
    
    # Wrangle wav into 16kHz format (param above did not seem to work...)
    ffmpeg -i out.wav -ar 16000 out-16kHz.wav
    
    whisper.exe out-16kHz.wav

llimllib · on Sept 8, 2024

Presumably that used whisper's bundled tiny model, which is no better than youtube CC. A beef I have with whisper-cpp is that they totally refuse to handle model management.

With mlx_whisper, I just have to tell it to use a model and it will download it if it's not already present: https://github.com/llimllib/yt-transcribe/blob/244841f83d833...

so if I add whisper.cpp as a dependency, I also have to add huggingface-cli or something similar. It also seems like huggingface-cli is not available on scoop

Leftium · on Sept 8, 2024

Not as convenient, but you could also have the user manually install the model, like whisper does.

Just forward the error message output by whisper, or even make a more user-friendly error message with instructions on how/where to download the models.

Whisper does provide a simple bash script to download models: https://github.com/ggerganov/whisper.cpp/blob/master/models/...

(As a Windows user, I can run bash scripts via Git Bash for Windows[1])

[1]: https://git-scm.com/download/win

llimllib · on Sept 8, 2024

thanks for all the help, I appreciate it.

Leftium · on Sept 9, 2024

Well, thanks to you I found out whisper generates decent audio transcriptions using a local LLM (relatively) easily, even on my 6+ year-old laptop.

(I used to upload videos to YouTube just to get the auto captions.)

I did some investigation, and it would not be difficult to convert the whisper LRC subtitle output into the format my fork of oTranscribe expects.

I already made a simple tool to convert YouTube TTML/SBV subtitle output: https://github.com/Leftium/otrgen

llimllib · on Sept 10, 2024

that's great! whisper is awesome software.

I'm working on a golang version that links to whisper.cpp directly to maybe make porting easier/possible

Leftium · on Sept 8, 2024

Yes, the model must be downloaded separately (see my edited comment with bash commands/comments).

The model is specified via whisper.exe `--model FNAME` parameter. By default, it looks for `models/ggml-base.en.bin`, but even that model must be downloaded separately.

So you could do this:

    # Assumes ggml-large-v3.bin model file[1] was already downloaded to models/ folder
    whisper.exe --model models/ggml-large-v3.bin out-16kHz.wav

[1]: https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-...

Leftium · on Sept 8, 2024

yt-dlp parameters distinguish between auto-generated and manual YouTube captions:

    # Downloads auto-generated captions
    yt-dlp --skip-download --write-auto-sub https://youtu.be/i-BkN3rTK0Q

    # Downloads manual captions
    yt-dlp --skip-download --write-sub https://youtu.be/i-BkN3rTK0Q

    # Fails with error: "There are no subtitles for the requested languages"
    yt-dlp --skip-download --write-sub https://youtu.be/ZMklf0vUl18

Docs: https://github.com/ytdl-org/youtube-dl?tab=readme-ov-file#su...

llimllib · on Sept 8, 2024

yeah, I found that when I was in there. I added an issue for tracking youtube CC support: https://github.com/llimllib/yt-transcribe/issues/2 with some thoughts about the challenges