Depends on the what you are trying to get out of a podcast. Most of the podcasts I listen to are because I want to learn something new in an entertaining format. I'm not listening to develop parasocial relationships with the hosts, so removing that element could be a good thing for me.
Of course if you listen to podcasts because you like the parasocial aspect or the celebrity interviews, then yeah... Not really a point.
I don't know that "parasocial relationships" are the primary reason people like having real hosts. I have a huge list of things I've managed to change in my life because I heard some other real person talking about how they were possible. Listening to these people over time and realizing there's nothing about them that's so special that it makes things possible for them that aren't possible for me gets me off my butt to set about the hard work of making the changes I didn't otherwise realize were possible.
> I don't know that "parasocial relationships" are the primary reason people like having real hosts
But it is likely one of the main. Me telling you that something is possible doesn't necessarily mean that it is real but you chose to believe it. Whether the source is human is not necessarily relevant. After all humans can and do lie all the time
You called a long term user a bot in the most rude way imaginable. Not only are you bad at spotting bots, but you’re rude about it for no reason. Good for you - you must feel very accomplished.
I listen to a number of podcasts which are reading books, stories, literature, etc. Having a professional actor read a text has appeal (e.g., Selected Shorts), but many are less-than-professional. A sufficiently-competent automated text-to-speech would fit at least some roles.
There are a few podcasts for which I'd have greater interest if the narration were by someone other than the current host....
There are also services such as the National Library for the Blind (UK) and BARD (US) which provide books, including a large number of audiobooks, for the blind. Automated text-to-speech would make a vastly larger library available, particularly of very recent publications, niche publications, and long-since-out-of-print books. Such services do take requests, but tend to focus on works published within the past five years.
I read the first of The Three Body Problem trilogy in print, and then listened to audiobook versions of the second & third books. Only they weren't audiobooks. I downloaded PDFs and then used a mobile app (Librera, I believe) to "read" them to me while I exercised. The benefit is that it allows arbitrary text to be converted to audio, but the downside is that it's only able to use your device's TTS voices, and there aren't any AI smarts built-in, so it was like listening to the Google Assistant read an audiobook. It got the job done, but now I have a somewhat visceral reaction to that Assistant voice having associated it with Chinese sci-fi for several weeks.
Something better would be very much appreciated. It's still not a replacement for high quality, professionally narrated audiobooks, but -- like you said, it's not just books that I'd like to consume this way.
Those are some good use cases. I only really listen to full-length audiobooks and not podcasts. An AI voice is probably sufficient, especially for niche content, but I would MUCH rather listen to a book narrated by a human. There are nuances to pacing, tone, and voice that I don't think AI will ever be able to fully grasp.
I listened to a lot of current AI „podcasting“ tools and wh ok me the voice is 95% perfect it does have its issues:
- suddenly speeding up or slowing down
- mispronunciation of non-standard words
- weird pauses
Having listened to a great many podcasts and interviews ... these are all very much problems with human-embodied voices as well.
(The number of SV types who talk as if they're on coke / meth / speed is ... nuts. A certain A-Z lead character comes to mind. Piketty is another. It'd be less problematic if they weren't constantly tripping over their own words, but they are.)
Throughout the piece, the narrator reads "ordnance" as "ordnances" (the word is both singular and plural), some awkward emphasis within phrases, and odd pronunciations ("malevolent" stands out). On the other hand, her pronunciation and accent of German words and place-names is excellent.
"Selected Shorts" is up there. My principle complaint is that episodes remain live for only a month or so. If you happen to catch an episode you like you'll have to keep it downloaded. All but certainly on account of copyright.
Various non-English pods as well, to maintain / increase fluency. Germany has a good set via Deutschlandfunk. I've found a few in other languages, though tending toward advertising-supported, which is less than ideal.
Searching for stories, literature, childrens' stories (a surprisingly good way to learn basic vocabulary, grammar, and culture), and history in your target language of choice tends to be a pretty good guide.
That's cool, but I guess I was specifically looking for something more advanced than basic text-to-speech (which most browsers nowadays have built in and can be achieved in 2 lines of JavaScript[0]). I was specifically looking for high quality natural speech sounding generated audio.
Cool app. The biggest issue for me is the voice sounds very much like the typical system voice apps, when we are seeing such leaps and bounds in the voice quality. But your interface is simple and nice.
Yeah technical papers probably wouldn't be a good use case. And definitely not what I had in mind. I was thinking about thinkpieces and older books that missed the audiobook trend
I've also been really interests in finding a way to make ai tts able to read equations. I'm currently pursuing my phd in physics and i listen to tts of textbooks in the gym. There just aren't human podcasts over the thing i need to learn right now for class, but if that dang tts could only read equations I'd be set!
A great way to learn something is to listen to a conversation among two to four well informed and articulate people, where each person has a memorable personality and each person has a different perspective about the topic.
This Google Illuminate experiment shows how just listening to two voices discuss a technical paper for three minutes is far more effective than reading a three-minute AI summary of the paper.
Imagine if there were three or four voices, with varied personalities, more humor and sarcasm, different priorities and points of view, and even a little disagreement.
Then imagine you're not just listening to the conversation, but you're participating in it. That seems like a pretty amazing way to learn.
I have a nonfiction draft built on conversations between 4 friends. Started as a regular nonfiction book but quickly realized the desired mainstreet audience would never read it. I created personas (as in UX style goal-directed design personas) to describe each character’s background, POV, goals, expertise, values, concerns and questions. Different than anything else I’ve ever written. Still very rough but rewarding.
Are you doing that with "old-fashioned" TTS, or have you found a good resource for uploading your own docs/epubs and having them read back by one of these higher quality synthesized voices? (I've been looking for the latter, but not having much luck.)
It'll be great when the AI generation gets on device and you won't need to pay per minute of text generated.
Elevenlabs would burn through the investors' money someday and they'd stop subsidizing the reader voice generation.
Just old-school TTS from Acapella, a paid one Heather.
I got used to it before there was a wide selection on Audible and it's ok.
You can't use audio for serious books or articles but History, Biographies, Fiction, random tech articles bookmarked in Pocket and it's locally generated, so no latency is great.
Additionally, when you use a TTS engine, you can see the text and easily copy the things you want to make a note on later. With Audiobooks it's not possible.
I actually quite often wish I could access a condensed version of a few podcasts in text form. Sometimes there's little nuggets of information dropped by hosts or guests that don't make it onto any other medium.
When I do intentionally listen to podcasts (i.e. as opposed to having to, because that's the only available form of some content), I do so because I enjoy the style of the conversation itself.
If it seemed full of annoying product placement, no. If the content and presentation were sufficiently good, yes.
I believe (but then again I also want to believe, so make of this what you will) that I'd be holding the AI to only the same standards I hold humans to. It's not like I'm trying to build a relationship to the speaker in either case.
I subscribed to the audio version of 'The Diff' by Byrne Hobart, and it's auto-generated. There's a few obvious tells, like when describing money - '$3' would be translated to 'dollar three'. But there's also occasional verbal nuances that I wouldn't expect from a TTS system. I don't love it, but I find his thoughts compelling enough to deal with it.
A lot of our customers use us [0] for that, it works pretty well if executed properly. The voiceovers work best as inserts into an existing podcast. If you see the articles of major news orgs like NYT, they often have a (usually) machine narrated voiceover.
I don't know, it depends on whether I get to control the auto generated podcast or someone else.
If I get to control it and I can have it draw in enough interesting angles into something, I think it could be fun. I wouldn't replace one of my favorites, but I'd gladly use something that could generate creative new content.
Your attention is your only real resource that you have to give online... giving it to bots on Instagram and Twitter is fairly "low attention" where you give the bot a few seconds of interaction. On YouTube or Spotify you're giving MUCH more attention, on the order of hours.
I wonder about a future where our attention isn't even spent on other people anymore. It's not really an online landscape I would be interested in.
Lex Friedman invites guests to just repeat whatever nonsense they write on their blogs without questioning any of the questionable claims, and plenty of people listen to it. This technology would be perfect for his podcast.