I just tried the default bark.cpp example from the github readme, and to me it s...

JonathanFly · 2024-09-30T06:01:57 1727676117

You aren't doing anything wrong - Bark out the box uses a randomly generated voice and I like to think it's modeling the world of random voices which includes bad microphones/audio-quality. (Even bad 'actors' - see how many Bark voices sound like they are reading a script.)

Presumably it was trained in noisy data. But it can generate and use a clean voice, they are in there. Most of the Suno default voices are not great either - but a great voice can sound perfectly clear. I haven't done much with Bark lately but on my Twitter there's plenty of clear examples of very realistic voices. Actually here I ran a prompt based on some copy and pasted test 20 times in Bark. I put a couple better results up front, but even in later samples you can find lots of evidence of human-sounding voices. https://sndup.net/bzhz5/

Going off the rails and hallucinating is a hard problem. It can be minimized, but probably would have to solved with simple brute force (check the output with S2T and retry if needed.)

For raw audio you can replace the final decoding step with something like VOCOS or MBD if you want to maximize audio quality, though you don't need do with the best voices.