Zero shot voice clones have never been very good. Fine tuned models hit natural speaker similarity and prosody in a way zero shot models can't emulate.
If it were a big model and was trained on a diverse set of speakers and could remember how to replicate them all, then zero shot is a potentially bigger deal. But this is a tiny model.
I'll try out the zero shot functionality of Pocket TTS and report back.
Because people would just feel bad about it and keep doing it. I don't care about people feeling bad about immoral behavior, I care about them not doing it in the first place.
"..our initial analysis estimates that around 0.15% of users active in a given week have conversations that include explicit indicators of potential suicidal planning or intent and 0.05% of messages contain explicit or implicit indicators of suicidal ideation or intent."
Roughly 700 million weekly active users, it's more like 1 million people discussing suicide with ChatGPT every week.
For reference, 12.8 million Americans are reported as thinking about suicide and 1.5 million are reported as attempting suicide in a year.
This ratio allows for a relatively simple 2x2 repeating pattern. That makes interpolating the values immensely simpler.
Also you don't want the red and blue to be too far apart, reconstructing the colour signal is difficult enough as it is. Moire effects are only going to get worse if you use an even sparser resolution.
Lol - they really should be locking down their email accounts and enforcing that policy. Or manually reviewing outbound messages before they can be sent. It seems likely that just telling the LLMs that will have a non-zero failure rate.
The github repo includes (among other things) a script (relying on python-pptx) to output decomposed layer images into a pptx file “where you can edit and move these layers flexibly.” (I've never user Powerpoint for this, but maybe it is good enough for this and ubiquitous enough that this is sensible?)
I saw some people at a company called Pruna AI got it down to 8 seconds with Cloudflare/Replicate, but I don't know if it was on consumer hardware or an A100/H100/H200, and I don't know if the inference optimization is open-source yet.
with torch.inference_mode():
output = pipeline(**inputs)
output_image = output.images[0]
for i, image in enumerate(output_image):
image.save(f"{i}.png")
Unless it's a joke that went over my head or you're talking about some other GitHub readme (there's only one GitHub link in TFA), posting an outright lie like this is not cool.
The word "powerpoint" is not there, however this text is:
“The following scripts will start a Gradio-based web interface where you can decompose an image and export the layers into a pptx file, where you can edit and move these layers flexibly.”
Oh okay I missed it, sorry. But that’s just using a separate python-pptx package to export the generated list of images to a .pptx file, not something inherent to the model.
It's the replies that are the interesting bit. It's not perfect, but it can maintain multiple conversations with different people in the same context, and do things like changing its current rules in response to conversations with users. Its slightly robotic tone is deliberate: it tries to convey information in the most efficient way possible. I'm not sure if that's an emergent property or if its in one of its fixed memory blocks. I do know that earlier on people managed to convince it to change its personality and cpfiffer had to intervene to stop people doing that.
These kind of LLM bots can be fun to play with in a "try to make it say/do something silly" way, but beyond that I don't really get the point. The writing style is grating and I don't think I've ever seen one say anything genuinely useful.
Yeah, I think it's an uncommon word. It's not a concept that would come up for most American English speakers, unless you're in a community that uses a language with another writing system (I think I first encountered it in a synagogue with Hebrew) or you're learning such a language.
I think I've maybe occasionally seen "translit." in text used to mark that the following is transliterated, but I could see that being easily glossed over.
If AI is at the point where it is exactly as capable of your average junior 3D professional in 10 years, it will probably have automated a ton (double digit percentage?) of current jobs such that nothing is safe. There's a lot of complexity, it's fairly long time horizon, it's very visually detailed, it's creative and subjective, and there's not a lot of easily accessible high quality training data.
It's like 2D art with more complexity and less training data. Non-AI 2D art and animation tools haven't been made irrelevant yet, and don't look like they will be soon.
reply