Hacker Newsnew | past | comments | ask | show | jobs | submit | gropo's commentslogin

Kokoro is better for tts by far

For voice cloning, pocket tts is walled so I can't tell


What are the advantages of PocketTTS over Kokoro?

It seems like Kokoro is the smaller model, also runs on CPU in real time, and is more open and fine tunable. More scripts and extensions, etc., whereas this is new and doesn't have any fine tuning code yet.

I couldn't tell an audio quality difference.


Kokoro is fine tunable? Speaking as someone who went down the rabbit hole... it's really not. There's no (as of last time I checked) training code available so you need to reverse engineer everything. Beyond that the model is not good at doing voices outside the existing voicepacks: simply put, it isn't a foundation model trained on internet scale data. It is made from a relatively small set of focused, synthetic voice data. So, a very narrow distribution to work with. Going OOD immediately tanks perceptual quality.

There's a bunch of inference stuff though, which is cool I guess. And it really is a quite nice little model in its niche. But let's not pretend there aren't huge tradeoffs in the design: synthetic data, phonemization, lack of train code, sharp boundary effects, etc.


Being able to voice clone with PocketTTS seems major, it doesn't look like there's any support for that with Kokoro.


Zero shot voice clones have never been very good. Fine tuned models hit natural speaker similarity and prosody in a way zero shot models can't emulate.

If it were a big model and was trained on a diverse set of speakers and could remember how to replicate them all, then zero shot is a potentially bigger deal. But this is a tiny model.

I'll try out the zero shot functionality of Pocket TTS and report back.


Would be curious to hear!


Less licensing headache, it seems. Kokoro says its Apache licensed. But it has eSpeak-NG as a dependency, which is GPL, which brings into question whether or not Kokoro is actually GPL. PocketTTS doesn't have eSpeak-NG as a dependency so you don't need to worry about all that BS.

Btw, I would love to hear from someone (who knows what they're talking about) to clear this up for me. Dealing with potential GPL contamination is a nightmare.


Kokoro only uses Espeak for text-to-phoneme (AKA G2P) conversion.

If you could find another compatible converter, you could probably replace eSpeak with it. The data could be a bit OOD, so you may need to fiddle with it, but it should work.

Because the GPL is outdated and doesn't really consider modern gen AI, what you could also do is to generate a bunch of text-to-phoneme pairs with Espeak and train your own transformer on them,. This would free you from the GPL license completely, and the task is easy enough that even a very small model should be able to do it.


If it depends on espeak NG code, the complete product is 100% GPL. That said, if you are able to change the code to take off the espeak dependency then the rest would revert to non-GPL (or even if it's a build time option that you can disable like FFMPEG with --enable-gpl)


Chatterbox-turbo is really good too. Has a version that uses Apple's gpu.


You should have bought some illegal street diet and exercise or cholesterol meds or whatever.


Why not both?


Sure it's absolutely true (I stopped reading there.)


How about moderate cardio and more fats in your diet


looking at all that data it appears that smoking cannabis is out of the question to this person.

the person should smoke cannabis regularly until a solution is found.


US govt could also require OFAC sanctions compliance on BTC via international framework as well.

30 warehouses or so, hundred websites, single code repository with a handful of developers.

Developers recently closed an open issue an American miner raised with OFAC compliance and the software, and even being able to help write the patch.

Beyond freezing, BTC can even be reissued via a module written expressly for such a purpose.


ill never eat PFAS again


Good for you. Want to share any good websites you have for how to support your goal here?

If I'm eating PFAS, I'd like to know.


PFAS (aka "forever chemicals") are present in everything including drinking water and breast milk so yes, you are eating PFAS.


twetch uses the bie1 encryption protocol on a plaintext blockchain to allow for append only data that can be retracted from view.

twetch.com


i have a handful of these and 4 core rpi goes faster than 6 core orange for many things.


like what? the benchmarks here make that seem vastly vastly unlikely. please, put up. make claims that are at least contestable.


also wondering about this


is there some sort of bench you want to see the output of?

ill try to reproduce/document for you.


okay, for example, the pi400 beats out the orangepi4 lts and orangepi 4b in the time it takes to make a sha256sum of a file


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: