Hacker Newsnew | past | comments | ask | show | jobs | submit | jamilton's commentslogin

Being able to voice clone with PocketTTS seems major, it doesn't look like there's any support for that with Kokoro.

Zero shot voice clones have never been very good. Fine tuned models hit natural speaker similarity and prosody in a way zero shot models can't emulate.

If it were a big model and was trained on a diverse set of speakers and could remember how to replicate them all, then zero shot is a potentially bigger deal. But this is a tiny model.

I'll try out the zero shot functionality of Pocket TTS and report back.


Would be curious to hear!

Because people would just feel bad about it and keep doing it. I don't care about people feeling bad about immoral behavior, I care about them not doing it in the first place.

https://openai.com/index/strengthening-chatgpt-responses-in-...

"..our initial analysis estimates that around 0.15% of users active in a given week have conversations that include explicit indicators of potential suicidal planning or intent and 0.05% of messages contain explicit or implicit indicators of suicidal ideation or intent."

Roughly 700 million weekly active users, it's more like 1 million people discussing suicide with ChatGPT every week.

For reference, 12.8 million Americans are reported as thinking about suicide and 1.5 million are reported as attempting suicide in a year.


Why that ratio in particular? I wonder if there’s a more complex ratio that could be better.


This ratio allows for a relatively simple 2x2 repeating pattern. That makes interpolating the values immensely simpler.

Also you don't want the red and blue to be too far apart, reconstructing the colour signal is difficult enough as it is. Moire effects are only going to get worse if you use an even sparser resolution.


Lol - they really should be locking down their email accounts and enforcing that policy. Or manually reviewing outbound messages before they can be sent. It seems likely that just telling the LLMs that will have a non-zero failure rate.


The linked GitHub readme says it outputs a powerpoint file of the layers.


...of all the possible formats, it outputs.. a powerpoint presentation..? What.


The github repo includes (among other things) a script (relying on python-pptx) to output decomposed layer images into a pptx file “where you can edit and move these layers flexibly.” (I've never user Powerpoint for this, but maybe it is good enough for this and ubiquitous enough that this is sensible?)


Lol, right?!?! I would've expected sequential PNGs followed by SVGs once the model improved.


That's what the example code at https://old.reddit.com/r/StableDiffusion/comments/1pqnghp/qw... generates. You get 0.png, 1.png ... n.png, where n= the requested number of layers-1.

It'll drop a 600W RTX 6000 to its knees for about a minute, but it does work.


I saw some people at a company called Pruna AI got it down to 8 seconds with Cloudflare/Replicate, but I don't know if it was on consumer hardware or an A100/H100/H200, and I don't know if the inference optimization is open-source yet.


I don't see the word powerpoint anywhere in https://github.com/QwenLM/Qwen-Image-Layered, I only see a code snippet saving a bunch of PNGs:

  with torch.inference_mode():
      output = pipeline(**inputs)
      output_image = output.images[0]
  
  for i, image in enumerate(output_image):
      image.save(f"{i}.png")
Unless it's a joke that went over my head or you're talking about some other GitHub readme (there's only one GitHub link in TFA), posting an outright lie like this is not cool.


> I don't see the word powerpoint anywhere in https://github.com/QwenLM/Qwen-Image-Layered,

The word "powerpoint" is not there, however this text is:

“The following scripts will start a Gradio-based web interface where you can decompose an image and export the layers into a pptx file, where you can edit and move these layers flexibly.”


Oh okay I missed it, sorry. But that’s just using a separate python-pptx package to export the generated list of images to a .pptx file, not something inherent to the model.


What do you like about Void? It reads about how I would expect a base chat model to post.


It's the replies that are the interesting bit. It's not perfect, but it can maintain multiple conversations with different people in the same context, and do things like changing its current rules in response to conversations with users. Its slightly robotic tone is deliberate: it tries to convey information in the most efficient way possible. I'm not sure if that's an emergent property or if its in one of its fixed memory blocks. I do know that earlier on people managed to convince it to change its personality and cpfiffer had to intervene to stop people doing that.


These kind of LLM bots can be fun to play with in a "try to make it say/do something silly" way, but beyond that I don't really get the point. The writing style is grating and I don't think I've ever seen one say anything genuinely useful.


Yeah, I think it's an uncommon word. It's not a concept that would come up for most American English speakers, unless you're in a community that uses a language with another writing system (I think I first encountered it in a synagogue with Hebrew) or you're learning such a language.

I think I've maybe occasionally seen "translit." in text used to mark that the following is transliterated, but I could see that being easily glossed over.


If AI is at the point where it is exactly as capable of your average junior 3D professional in 10 years, it will probably have automated a ton (double digit percentage?) of current jobs such that nothing is safe. There's a lot of complexity, it's fairly long time horizon, it's very visually detailed, it's creative and subjective, and there's not a lot of easily accessible high quality training data.

It's like 2D art with more complexity and less training data. Non-AI 2D art and animation tools haven't been made irrelevant yet, and don't look like they will be soon.


Not quite. The junior produced also source filed that a senior can enhance. AI gives you the end result that can’t be as easy tinkered with.


Yeah, LLMs used to not be up to par for new Project Euler problems, but GPT-5 was able to do a few of the recent ones which I tried a few weeks ago.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: