Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Transcribe YouTube Videos
1 point by llimllib on Sept 7, 2024 | hide | past | favorite | 11 comments
I often want to read the contents of YouTube videos but I don’t want to watch them, and YouTube’s captioning leaves a lot to be desired.

So I wrote a shell script that takes a URL, downloads the video, transcribes it with whisper, and turns it into a clean HTML page for reading.

https://github.com/llimllib/yt-transcribe



Perhaps skip the whisper step if the video already has decent manual (not auto-generated) captions, like this video: https://youtu.be/i-BkN3rTK0Q

---

Another use case is being able to quickly jump to a specific spot inside a video. Could you add timestamps with links that jump directly to that point in the video?

I accomplished something similar by modifying oTranscribe:

- https://otranscribe.netlify.app/?vsl=definedefine

- https://otranscribe.netlify.app/?vsl=letter

---

Finally, I'm a windows user so a whisper.cpp version would be nice~


good idea! I've no idea how to distinguish between auto-generated and manual captions, but I definitely should take them if available.

A timestamp flag is also a good idea.

Will noodle on a whisper.cpp version!

A question: on mac I can `brew install whisper-cpp`. Is there any equivalent way to install it on a windows machine? I haven't used windows in a very long time.


On Windows, I use scoop.sh: https://scoop.sh/#/apps?q=whisper

I was able to do this:

    scoop install main/whisper-cpp
    
    mkdir models
    
    ## Download model file ggml-base.en.bin to models directory above
    
    yt-dlp.exe -x --audio-format wav --audio-quality 16K -o "out.wav" ZMklf0vUl18
    
    # Wrangle wav into 16kHz format (param above did not seem to work...)
    ffmpeg -i out.wav -ar 16000 out-16kHz.wav
    
    whisper.exe out-16kHz.wav


Presumably that used whisper's bundled tiny model, which is no better than youtube CC. A beef I have with whisper-cpp is that they totally refuse to handle model management.

With mlx_whisper, I just have to tell it to use a model and it will download it if it's not already present: https://github.com/llimllib/yt-transcribe/blob/244841f83d833...

so if I add whisper.cpp as a dependency, I also have to add huggingface-cli or something similar. It also seems like huggingface-cli is not available on scoop


Not as convenient, but you could also have the user manually install the model, like whisper does.

Just forward the error message output by whisper, or even make a more user-friendly error message with instructions on how/where to download the models.

Whisper does provide a simple bash script to download models: https://github.com/ggerganov/whisper.cpp/blob/master/models/...

(As a Windows user, I can run bash scripts via Git Bash for Windows[1])

[1]: https://git-scm.com/download/win


thanks for all the help, I appreciate it.


Well, thanks to you I found out whisper generates decent audio transcriptions using a local LLM (relatively) easily, even on my 6+ year-old laptop.

(I used to upload videos to YouTube just to get the auto captions.)

I did some investigation, and it would not be difficult to convert the whisper LRC subtitle output into the format my fork of oTranscribe expects.

I already made a simple tool to convert YouTube TTML/SBV subtitle output: https://github.com/Leftium/otrgen


that's great! whisper is awesome software.

I'm working on a golang version that links to whisper.cpp directly to maybe make porting easier/possible


Yes, the model must be downloaded separately (see my edited comment with bash commands/comments).

The model is specified via whisper.exe `--model FNAME` parameter. By default, it looks for `models/ggml-base.en.bin`, but even that model must be downloaded separately.

So you could do this:

    # Assumes ggml-large-v3.bin model file[1] was already downloaded to models/ folder
    whisper.exe --model models/ggml-large-v3.bin out-16kHz.wav
[1]: https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-...


yt-dlp parameters distinguish between auto-generated and manual YouTube captions:

    # Downloads auto-generated captions
    yt-dlp --skip-download --write-auto-sub https://youtu.be/i-BkN3rTK0Q

    # Downloads manual captions
    yt-dlp --skip-download --write-sub https://youtu.be/i-BkN3rTK0Q

    # Fails with error: "There are no subtitles for the requested languages"
    yt-dlp --skip-download --write-sub https://youtu.be/ZMklf0vUl18
Docs: https://github.com/ytdl-org/youtube-dl?tab=readme-ov-file#su...


yeah, I found that when I was in there. I added an issue for tracking youtube CC support: https://github.com/llimllib/yt-transcribe/issues/2 with some thoughts about the challenges




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: