We support voice cloning so you can mimic the sound of any real voice (or try to...

dale_glass · on Jan 18, 2024

Sounds excellent! What are the requirements to run this regarding hardware? How much VRAM? Does it work on AMD or Intel Arc?

jpcl · on Jan 18, 2024

Both models are using around 3GB right now (converted into FP16 for speed). But I checked that the (slower) FP32 version uses 2.3GB so we are probably doing something suboptimal here.

We support CUDA right now although it should not be too hard to port it to whisper/llama.cpp or Apple's MLX. It's a pretty straightforward transformer architecture.