If I'm reading the pricing correctly, these models are SIGNIFICANTLY cheaper tha...

furyofantares · 2025-03-20T20:08:58 1742501338

It's way cheaper - everyone is, elevenlabs is very expensive. Nobody matches their quality though. Especially if you want something that doesn't sound like a voice assistant/audiobook/podcast/news anchor/tv announcer.

This openai offering is very interesting, it offers valuable features elevenlabs doesn't in emotional control. It also hallucinates though which would need to be fixed for it to be very useful.

camillomiller · 2025-03-21T09:52:59 1742550779

It's cheap because everything OpenAI does is subsidized by investors' money. Until that stupid money flows all good! Then either they'll go the way of WeWork, or enshittification will happen to make it possible for them to make the books work. I don't see any other option. Unless Softbank decides it has some 150 Billion to squander on buying them off. There's a lot of head-in-the-sand behavior going on around OpenAI fundamentals and I don't understand exactly why it's not more in the open yet.

ImprobableTruth · 2025-03-21T10:12:19 1742551939

If you compare with e.g. Deepseek and other hosters, you'll find that OpenAI is actually almost certainly charging very high margins (Deepseek has an 80% profit margin and they're 10x cheaper than openai).

The training/R&D might make OpenAI burn VC cash, but this isn't comparable with companies like WeWork whose products actively burn cash

camillomiller · 2025-03-21T10:43:11 1742553791

They said themselves that even inference is losing them money tho, or did I get that wrong?

ImprobableTruth · 2025-03-21T11:27:52 1742556472

On their subscriptions, specifically the pro subscription, because it's a flatrate to their most expensive model. The API prices are all much more expensive. It's unclear whether they're losing money on the normal subscriptions, but if so, probably not by much. Though it's definitely closer to what you described, subsidizing it to gain 'mindshare' or whatever.

yousif_123123 · 2025-03-21T14:00:25 1742565625

Well I think there's many cheaper models in terms of bang for buck currently per token and intelligence than gpt4o. Other than OpenAI having very high rate limits and throughout available without a contract done with sales, I don't see much reason to use it currently instead of sonnet 3.5 or 3.7, or Google's Flash 2.0

Perhaps their training cost and their current inference cost is higher, but what you get as a customer is a more expensive product for what it is, IMO.

Szpadel · 2025-03-22T06:13:03 1742623983

they for sure lose money on some months for some customers, but I expect globally most of subscriptions (including mine that I recently cancelled) would be much better of to migrate to API

everyone that o know that have/had subscription didn't used it very extensively, and that is how it's still profitable in general

I suspect that it's the same for copilot, especially the business variant, while they definitely lose money on my account, believe that when looking on our whole company subscription I wouldn't be surprised that it's even 30% of what we pay

BoorishBears · 2025-03-23T16:11:59 1742746319

That's not true. ElevenLabs margins are insane and their largest advantage is high quality audio data.

ashvardanian · 2025-03-21T19:10:06 1742584206

To be fair, ElevenLabs has raised of the order of $300M of VC money as well.

asah · 2025-03-21T18:12:11 1742580731

haha, yeah this combo was pretty hilarious and highly inconsistent from reading to reading: https://www.openai.fm/#b2a4c1ca-b15a-44eb-9cd9-377f0e47e5a6

com2kid · 2025-03-20T22:05:24 1742508324

Elevenlabs is an ecosystem play. They have hundreds of different voices, legally licensed from real people who chose to upload their voice. It is a marketplace of voices.

None of the other major players is trying to do that, not sure why.

SXX · 2025-03-21T06:11:32 1742537492

Going with this would mean AI companies suppose to pay for something like voices or other training data.

It's far better to just steal it all and ask government for exception.

fixprix · 2025-03-20T18:25:34 1742495134

It looks like they are targeting Google's TTS price point which is $16 per million characters which comes out to $0.015/minute.

oidar · 2025-03-20T22:46:59 1742510819

ElevenLabs is the only one offering speech to speech generation where the intonation, prosody, and timing is kept intact. This allows for one expressive voice actor to slip into many other voices.

goshx · 2025-03-20T23:42:45 1742514165

OpenAI’s Realtime speech to speech is far superior than ElevenLabs.

noahlt · 2025-03-21T05:26:56 1742534816

What ElevenLabs and OpenAI call “speech to speech” are completely different.

ElevenLabs’ takes as input audio of speech and maps it to a new speech audio that sounds like a different speaker said it, but with the exact same intonation.

OpenAI’s is an end-to-end multimodal conversational model that listens to a user speaking and responds in audio.

goshx · 2025-03-21T15:22:59 1742570579

I see now. Thank you for clarifying. I thought this about ElevenLabs Conversational API.

echelon · 2025-03-20T18:34:44 1742495684

ElevenLabs is incredibly over-priced and that's how they were able to achieve the MRR that led to their incredible fundraising.

No matter what happens, they'll eventually be undercut and matched in terms of quality. It'll be a race to the bottom for them too.

ElevenLabs is going to have a tough time. They've been way too expensive.

MrAssisted · 2025-03-20T18:53:07 1742496787

I hope they find a more unique product offering that takes hold. Everybody thinks of them as text-to-speech but I use ElevenLabs exclusively for speech-to-speech for vtubing as my AI character. They're kind of the only game in town for doing super high quality speech-to-speech (unless someone here has an alternative which I'd LOVE to know about). I've tried https://github.com/w-okada/voice-changer which is great because it's real-time but the quality is enough of a step down that actual words I'm saying become unclear and difficult to understand. Also with that I am tied to using my RTX 3090 desktop vs ElevenLabs which I can do in the cloud from my laptop anywhere.

I'm pretty much dependent on ElevenLabs to do my vtubing at this point but I can't imagine speech-to-speech has wide adoption so I don't know if they'll even keep it around.

eob · 2025-03-20T19:56:33 1742500593

Are you comfortable sharing the video & lip-sync stack you use? I don't know anything about the space but am curious to check out what's possible these days.

MrAssisted · 2025-03-20T20:21:53 1742502113

For my last video I used https://github.com/warmshao/FasterLivePortrait with a png of the character on my RTX 3090 desktop and recorded the output of that real-time but in the next video I'm going to spin up a runpod instance and do the FasterLivePortrait in the cloud after the fact because then I can get a smooth 60fps which looks better. I think the only real-time cloud way to do AI vtubing in the cloud is my own GenDJ project (fork of https://github.com/kylemcdonald/i2i-realtime but tweaked for cloud real-time) but that just doesn't look remotely as good as LivePortrait. Somebody needs to rip out and replace insightface in FasterLivePortait (it's prohibited for commercial use) and fork https://github.com/GenDJ to have the runpod it spins up run the de-insightfaced LivePortrait instead of i2i-realtime. I'll probably get around to doing that in the next few months if nobody else does and nothing else comes along and makes LivePortrait obsolete (both are big ifs).

AIWarper recently released a simpler way to run FasterLivePortrait for vtubing purposes https://huggingface.co/AIWarper/WarpTuber but I haven't tried it yet because I already have my own working setup and as I mentioned I'm shifting my workload for that to the cloud anyways

maest · 2025-03-21T00:03:19 1742515399

Do you mind sharing your yt account? If you are okay with linking it to your hn account. I'd quite like to see the results.

simonpure · 2025-03-21T16:12:22 1742573542

I was curious as well.

Not OP but via their website linked in their profile -

https://youtu.be/Tl3pGTYEd2I

muyuu · 2025-03-20T22:56:47 1742511407

you can't be too expensive as a first mover provided you sell your service

whatever capital they've accrued, it won't hurt when the market prices are lower

huijzer · 2025-03-20T21:05:16 1742504716

Yes ElevenLabs is orders of magnitude more expensive than everyone else. Very clever from a business perspective, I think. They are (were?) the best so know that people will pay a premium for that.

internet101010 · 2025-03-21T17:48:02 1742579282

Yeah the way I see it this is where we find the value of customization. We are already seeing its use by YouTube video essay creators who turn their own voice into models. I want to see corporate executives get on board so that we can finally ditch the god awful phone quality in earnings calls.

lukebuehler · 2025-03-20T18:27:06 1742495226

yes, I think you are right. When I did the math on 11labs million chars I got the same numbers (Pro plan).

I'm super happy about this, since I took a bet that exactly this would happen. I've just been building a consumer TTS app that could only work with significant cheaper TTS prices per million character (or self-hosted models)

lherron · 2025-03-20T19:43:29 1742499809

Kokoro TTS is pretty good for open source. Worth checking out.

lukebuehler · 2025-03-20T22:20:56 1742509256

Yes, kokoro is great, and the language flexibility is a huge plus too. And the best prices per character is for sure if you self-host.

stavros · 2025-03-20T23:56:43 1742515003

Oh man, they have the "Sky" voice, and it seems to be the same one that OpenAI had but then removed? Not sure how that's possible, but I'm very happy about it.

diggan · 2025-03-21T01:05:38 1742519138

> Not sure how that's possible

Download bunch of movies Scarlet Johansen been in, segment into audio clips where she talks and train the model :)

stavros · 2025-03-21T07:59:17 1742543957

Is it actually her? I didn't think it was, but maybe.

diggan · 2025-03-21T10:22:58 1742552578

Unless there is some leak from OpenAI, I'm not sure we'll ever have it confirmed yes or no. But my brain thought it was Johansen from the first few seconds I heard the voice and I don't seem to be alone with that reaction. The fact that they removed the voice also speaks to it to have been trained on her voice.

Listening to it again today with fresher ears (the original OpenAI Sky, not the clones elsewhere), I still hear Johansen as the underlying voice actor for it, but maybe there is some subconscious bias I'm unable to bypass.

stavros · 2025-03-21T11:00:53 1742554853

Hmm, I never thought it was her, her voice is much more raspy, whereas Sky is a bit lighter. I can hear the similarity, I just don't think they sound exactly alike.

As you say, I'm not sure we'll ever know, although the Sky voice from Kokoro is spot on the Sky voice from OpenAI, so maybe someone from Kokoro knows how they got it.

zacmps · 2025-03-20T18:54:30 1742496870

What does it do?

lukebuehler · 2025-03-20T20:23:11 1742502191

Convert any file (pdf, epub, txt) to an audoibook, downloadable as mp3, or directly listenable via RSS feed in, say, Apple Potcasts app.

Basically make one-off audiobooks for yourself or a few friends.

AyyEye · 2025-03-21T03:29:35 1742527775

For anyone else reading this, librera reader + sherpaTTS are both FOSS android apps and can read anything librera can open on an ad-hoc basis, with no need to futz with files, just load your ebook bookmark and hit play.

SherpaTTS has a bunch of different models (piper/coqui) with a ton of voices/languages. There's a slight but tolerable delay with piper high models but low is realtime.

setsewerd · 2025-03-20T21:24:20 1742505860

Any plans to make a Chrome extension variant? Been looking for a high quality and cheap TTS extension for ages (like ElevenLabs Human Reader, except with less absurd pricing)

lukebuehler · 2025-03-20T22:19:42 1742509182

I din't think of that, interesting idea. What I'm focusing right now is long-form content for more offline-ish listening, but maybe a plugin could work to load longer texts, but I'm not working on a screen reader atm.

wholinator2 · 2025-03-20T22:26:29 1742509589

Do you know if there's any offerings today that can read math? Like speak an equation the way a human would? It's something I've been thinking about a long time and would be an essential feature for me (the only things i read are physics)

tough · 2025-03-21T04:17:22 1742530642

I saw a small model trained on outputting currency aware text from decimals/integers

i wonder if you could make a similar -narrow- lora finetune to train a model to output human readable text from say latext formulas with a good data set to train on

dockerd · 2025-03-21T07:14:13 1742541253

What is your use-case here?

setsewerd · 2025-03-21T17:01:46 1742576506

Primarily for reading articles aloud online. I've been trying the latest Siri TTS which is a big improvement (and free), but it's still nowhere near accurate enough for proper nouns or newer terms, which ElevenLabs handles much better.

benjismith · 2025-03-20T18:34:16 1742495656

Same for me :)

forgotpasagain · 2025-03-20T18:45:38 1742496338

Almost everyone is cheaper than ElevenLabs though.

whimsicalism · 2025-03-20T20:05:49 1742501149

Sesame is free and pretty good and you can run it yourself.

kuprel · 2025-03-20T20:27:46 1742502466

They released a crippled model: https://github.com/SesameAILabs/csm/issues/63

hnhn34 · 2025-03-20T20:48:44 1742503724

The good news is Orpheus-3B just made Sesame essentially obsolete.

Foreignborn · 2025-03-20T21:50:57 1742507457

thanks for this, it sounds pretty good.

link for anyone else: https://canopylabs.ai/model-releases

sandspar · 2025-03-20T22:24:42 1742509482

These voices are all annoying, though. The thing about Sesame's Miles is that he's cool.

satvikpendem · 2025-03-21T18:50:24 1742583024

You can clone any voice you want with Orpheus.

whimsicalism · 2025-03-21T17:56:06 1742579766

admittedly i only tested it for a few queries, but it worked fine for me except for lack of consistency in voice

youssefabdelm · 2025-03-20T20:03:44 1742501024

Def prefer the pricing but so far on 4o, no timestamps or diarization sadly

kuprel · 2025-03-20T18:48:18 1742496498

OpenAI doesn’t have voice cloning

dannyw · 2025-03-20T18:59:26 1742497166

They do, they just don’t offer it.

tiahura · 2025-03-20T20:05:33 1742501133

You missed the story:

https://community.openai.com/t/chatgpt-unexpectedly-began-sp...

ChatGPT unexpectedly began speaking in a user’s cloned voice during testing