Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If I'm reading the pricing correctly, these models are SIGNIFICANTLY cheaper than ElevenLabs.

https://platform.openai.com/docs/pricing

If these are the "gpt-4o-mini-tts" models, and if the pricing estimate of "$0.015 per minute" of audio is correct, then these prices 85% cheaper than those of ElevenLabs.

https://elevenlabs.io/pricing

With ElevenLabs, if I choose their most cost-effectuve "Business" plan for $1100 per month (with annual billing of $13,200, a savings of 17% over monthly billing), then I get 11,000 minutes TTS, and each minute is billed at 10 cents.

With OpenAI, I could get 11,000 minutes of TTS for $165.

Somebody check my math... Is this right?



It's way cheaper - everyone is, elevenlabs is very expensive. Nobody matches their quality though. Especially if you want something that doesn't sound like a voice assistant/audiobook/podcast/news anchor/tv announcer.

This openai offering is very interesting, it offers valuable features elevenlabs doesn't in emotional control. It also hallucinates though which would need to be fixed for it to be very useful.


It's cheap because everything OpenAI does is subsidized by investors' money. Until that stupid money flows all good! Then either they'll go the way of WeWork, or enshittification will happen to make it possible for them to make the books work. I don't see any other option. Unless Softbank decides it has some 150 Billion to squander on buying them off. There's a lot of head-in-the-sand behavior going on around OpenAI fundamentals and I don't understand exactly why it's not more in the open yet.


If you compare with e.g. Deepseek and other hosters, you'll find that OpenAI is actually almost certainly charging very high margins (Deepseek has an 80% profit margin and they're 10x cheaper than openai).

The training/R&D might make OpenAI burn VC cash, but this isn't comparable with companies like WeWork whose products actively burn cash


They said themselves that even inference is losing them money tho, or did I get that wrong?


On their subscriptions, specifically the pro subscription, because it's a flatrate to their most expensive model. The API prices are all much more expensive. It's unclear whether they're losing money on the normal subscriptions, but if so, probably not by much. Though it's definitely closer to what you described, subsidizing it to gain 'mindshare' or whatever.


Well I think there's many cheaper models in terms of bang for buck currently per token and intelligence than gpt4o. Other than OpenAI having very high rate limits and throughout available without a contract done with sales, I don't see much reason to use it currently instead of sonnet 3.5 or 3.7, or Google's Flash 2.0

Perhaps their training cost and their current inference cost is higher, but what you get as a customer is a more expensive product for what it is, IMO.


they for sure lose money on some months for some customers, but I expect globally most of subscriptions (including mine that I recently cancelled) would be much better of to migrate to API

everyone that o know that have/had subscription didn't used it very extensively, and that is how it's still profitable in general

I suspect that it's the same for copilot, especially the business variant, while they definitely lose money on my account, believe that when looking on our whole company subscription I wouldn't be surprised that it's even 30% of what we pay


That's not true. ElevenLabs margins are insane and their largest advantage is high quality audio data.


To be fair, ElevenLabs has raised of the order of $300M of VC money as well.


haha, yeah this combo was pretty hilarious and highly inconsistent from reading to reading: https://www.openai.fm/#b2a4c1ca-b15a-44eb-9cd9-377f0e47e5a6


Elevenlabs is an ecosystem play. They have hundreds of different voices, legally licensed from real people who chose to upload their voice. It is a marketplace of voices.

None of the other major players is trying to do that, not sure why.


Going with this would mean AI companies suppose to pay for something like voices or other training data.

It's far better to just steal it all and ask government for exception.


It looks like they are targeting Google's TTS price point which is $16 per million characters which comes out to $0.015/minute.


ElevenLabs is the only one offering speech to speech generation where the intonation, prosody, and timing is kept intact. This allows for one expressive voice actor to slip into many other voices.


OpenAI’s Realtime speech to speech is far superior than ElevenLabs.


What ElevenLabs and OpenAI call “speech to speech” are completely different.

ElevenLabs’ takes as input audio of speech and maps it to a new speech audio that sounds like a different speaker said it, but with the exact same intonation.

OpenAI’s is an end-to-end multimodal conversational model that listens to a user speaking and responds in audio.


I see now. Thank you for clarifying. I thought this about ElevenLabs Conversational API.


ElevenLabs is incredibly over-priced and that's how they were able to achieve the MRR that led to their incredible fundraising.

No matter what happens, they'll eventually be undercut and matched in terms of quality. It'll be a race to the bottom for them too.

ElevenLabs is going to have a tough time. They've been way too expensive.


I hope they find a more unique product offering that takes hold. Everybody thinks of them as text-to-speech but I use ElevenLabs exclusively for speech-to-speech for vtubing as my AI character. They're kind of the only game in town for doing super high quality speech-to-speech (unless someone here has an alternative which I'd LOVE to know about). I've tried https://github.com/w-okada/voice-changer which is great because it's real-time but the quality is enough of a step down that actual words I'm saying become unclear and difficult to understand. Also with that I am tied to using my RTX 3090 desktop vs ElevenLabs which I can do in the cloud from my laptop anywhere.

I'm pretty much dependent on ElevenLabs to do my vtubing at this point but I can't imagine speech-to-speech has wide adoption so I don't know if they'll even keep it around.


Are you comfortable sharing the video & lip-sync stack you use? I don't know anything about the space but am curious to check out what's possible these days.


For my last video I used https://github.com/warmshao/FasterLivePortrait with a png of the character on my RTX 3090 desktop and recorded the output of that real-time but in the next video I'm going to spin up a runpod instance and do the FasterLivePortrait in the cloud after the fact because then I can get a smooth 60fps which looks better. I think the only real-time cloud way to do AI vtubing in the cloud is my own GenDJ project (fork of https://github.com/kylemcdonald/i2i-realtime but tweaked for cloud real-time) but that just doesn't look remotely as good as LivePortrait. Somebody needs to rip out and replace insightface in FasterLivePortait (it's prohibited for commercial use) and fork https://github.com/GenDJ to have the runpod it spins up run the de-insightfaced LivePortrait instead of i2i-realtime. I'll probably get around to doing that in the next few months if nobody else does and nothing else comes along and makes LivePortrait obsolete (both are big ifs).

AIWarper recently released a simpler way to run FasterLivePortrait for vtubing purposes https://huggingface.co/AIWarper/WarpTuber but I haven't tried it yet because I already have my own working setup and as I mentioned I'm shifting my workload for that to the cloud anyways


Do you mind sharing your yt account? If you are okay with linking it to your hn account. I'd quite like to see the results.


I was curious as well.

Not OP but via their website linked in their profile -

https://youtu.be/Tl3pGTYEd2I


you can't be too expensive as a first mover provided you sell your service

whatever capital they've accrued, it won't hurt when the market prices are lower


Yes ElevenLabs is orders of magnitude more expensive than everyone else. Very clever from a business perspective, I think. They are (were?) the best so know that people will pay a premium for that.


Yeah the way I see it this is where we find the value of customization. We are already seeing its use by YouTube video essay creators who turn their own voice into models. I want to see corporate executives get on board so that we can finally ditch the god awful phone quality in earnings calls.


yes, I think you are right. When I did the math on 11labs million chars I got the same numbers (Pro plan).

I'm super happy about this, since I took a bet that exactly this would happen. I've just been building a consumer TTS app that could only work with significant cheaper TTS prices per million character (or self-hosted models)


Kokoro TTS is pretty good for open source. Worth checking out.


Yes, kokoro is great, and the language flexibility is a huge plus too. And the best prices per character is for sure if you self-host.


Oh man, they have the "Sky" voice, and it seems to be the same one that OpenAI had but then removed? Not sure how that's possible, but I'm very happy about it.


> Not sure how that's possible

Download bunch of movies Scarlet Johansen been in, segment into audio clips where she talks and train the model :)


Is it actually her? I didn't think it was, but maybe.


Unless there is some leak from OpenAI, I'm not sure we'll ever have it confirmed yes or no. But my brain thought it was Johansen from the first few seconds I heard the voice and I don't seem to be alone with that reaction. The fact that they removed the voice also speaks to it to have been trained on her voice.

Listening to it again today with fresher ears (the original OpenAI Sky, not the clones elsewhere), I still hear Johansen as the underlying voice actor for it, but maybe there is some subconscious bias I'm unable to bypass.


Hmm, I never thought it was her, her voice is much more raspy, whereas Sky is a bit lighter. I can hear the similarity, I just don't think they sound exactly alike.

As you say, I'm not sure we'll ever know, although the Sky voice from Kokoro is spot on the Sky voice from OpenAI, so maybe someone from Kokoro knows how they got it.


What does it do?


Convert any file (pdf, epub, txt) to an audoibook, downloadable as mp3, or directly listenable via RSS feed in, say, Apple Potcasts app.

Basically make one-off audiobooks for yourself or a few friends.


For anyone else reading this, librera reader + sherpaTTS are both FOSS android apps and can read anything librera can open on an ad-hoc basis, with no need to futz with files, just load your ebook bookmark and hit play.

SherpaTTS has a bunch of different models (piper/coqui) with a ton of voices/languages. There's a slight but tolerable delay with piper high models but low is realtime.


Any plans to make a Chrome extension variant? Been looking for a high quality and cheap TTS extension for ages (like ElevenLabs Human Reader, except with less absurd pricing)


I din't think of that, interesting idea. What I'm focusing right now is long-form content for more offline-ish listening, but maybe a plugin could work to load longer texts, but I'm not working on a screen reader atm.


Do you know if there's any offerings today that can read math? Like speak an equation the way a human would? It's something I've been thinking about a long time and would be an essential feature for me (the only things i read are physics)


I saw a small model trained on outputting currency aware text from decimals/integers

i wonder if you could make a similar -narrow- lora finetune to train a model to output human readable text from say latext formulas with a good data set to train on


What is your use-case here?


Primarily for reading articles aloud online. I've been trying the latest Siri TTS which is a big improvement (and free), but it's still nowhere near accurate enough for proper nouns or newer terms, which ElevenLabs handles much better.


Same for me :)


Almost everyone is cheaper than ElevenLabs though.


Sesame is free and pretty good and you can run it yourself.


They released a crippled model: https://github.com/SesameAILabs/csm/issues/63


The good news is Orpheus-3B just made Sesame essentially obsolete.


thanks for this, it sounds pretty good.

link for anyone else: https://canopylabs.ai/model-releases


These voices are all annoying, though. The thing about Sesame's Miles is that he's cool.


You can clone any voice you want with Orpheus.


admittedly i only tested it for a few queries, but it worked fine for me except for lack of consistency in voice


Def prefer the pricing but so far on 4o, no timestamps or diarization sadly


OpenAI doesn’t have voice cloning


They do, they just don’t offer it.


You missed the story:

https://community.openai.com/t/chatgpt-unexpectedly-began-sp...

ChatGPT unexpectedly began speaking in a user’s cloned voice during testing




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: