Show HN: Podscripter – Automated Transcription for Podcasters

craigcannon · on June 4, 2018

Hey HN!

Craig from YC here. This project is a follow-up to SpeechBoard, which was a text-based audio editor - https://news.ycombinator.com/item?id=15670827. Thanks for all the feedback there :)

We were surprised to find that many users just wanted transcripts, so Podscripter is an attempt to solve that.

Here's how it works: every time you publish an episode (or give us a file) we run it through a speech to text service. Then we split up the speakers by hand, which ends up being a fair bit of work and is why it's 24hrs instead of minutes . Then we email you the transcript.

Before I was podcasting at YC I had my own podcast and couldn't justify paying $1 a minute for transcripts. These machine generated transcripts get you most of the way there for a lot less money :)

Let me know what you think!

raghavtoshniwal · on June 4, 2018

Hey Craig,

I wrote a service, (nowhere close for public release), that segments audio based on speakers. You have to identify one speech segment, it is then capable of labelling others. It uses GMM and MFCC. Is something like this in the works? Cool idea! I consume a fair bit of podcasts, I can affirm that there is definitely a need for this

craigcannon · on June 4, 2018

Nice! Let me know when it's ready :)

I've tried what's out there and still haven't found a solution that can consistency diarize well. So I'm doing some experiments on my end too.

braindead_in · on June 4, 2018

Do try out https://scribie.com/transcription/free as well once. Our diarisation system is around 90% accurate on longer paragraphs and will be out this week.

jtbayly · on June 4, 2018

For $10, I'd expect above 99% accuracy, especially since I can get an automated transcript for $5 for 60 minutes that presumably has a similar error rate to what you are offering.

Also, I'd expect it sooner than 24 hours, since I can get automated ones back in under an hour.

Not trying to be cold water. I am actually interested, but what sets you apart from the other cheaper automated solutions? Am I wrong about the error rate I can expect elsewhere?

ghaff · on June 4, 2018

By way of context, human transcriptions cost about $1 to $1.50 per [EDIT: minute] --which is what I use for my podcasts. Accuracy is extremely good, especially if you flag obscure terms when you submit.

mindwork · on June 4, 2018

For a podcasts with good audio quality(which is usually a case for podcasts) you probably can use speech recognition tools that will cost you a fraction of this price. They are pretty good nowdays

ghaff · on June 4, 2018

They're really not--at least for my purposes. And $25-50 for an episode requiring minimal cleanup versus spending 30-60 minutes going back through the audio and fixing things up is a no-brainer for me.

ADDED: Machine transcriptions are pretty good for a lot of things such as search and quickly skimming content. But if you want something that people can read as an alternative to listening to the podcast, you pretty much have to use human transcription or budget a bunch of time to fix up.

seanwilson · on June 4, 2018

Is there a way you can record then edit a podcast but keep track of which microphone different voices are coming from? Seems like you could make speaker identification easier that way?

craigcannon · on June 4, 2018

Yeah, if you record on multiple tracks it's pretty easy but podcasting setups vary a ton.

craigcannon · on June 4, 2018

That's super cheap! Who do you use?

ghaff · on June 4, 2018

Uggh. Per minute. Sorry. So it costs me about $25-30 per podcast typically. Which is pretty reasonable but obviously adds up for longer transcriptions.

craigcannon · on June 4, 2018

Thanks for the feedback. Ours are diarized accurately. I've yet to see an automated service that does that well.

jtbayly · on June 4, 2018

That's a valuable piece of info that I didn't see mentioned on the site. I'd add it if I were you.

craigcannon · on June 4, 2018

Updated. Thanks for the feedback.

knuththetruth · on June 4, 2018

What automated service do you use?

craigcannon · on June 4, 2018

Google

joelrunyon · on June 4, 2018

Would this integrate directly with zencastr? They record tracks individually - and then mix them - so you'd have direct access to the individual tracks.

Seems like if you did that or made a way for people to upload their unmixed tracks, you could save some time on the whole thing?

craigcannon · on June 4, 2018

Yeah, that's an awesome idea.

I had kicked it around with some potential customers but went with this simpler model just to see if anyone was interested.

Will most likely go for that next because you're right, diarization is insane :)

viridian · on June 4, 2018

Hey Craig, I appreciate the hard work! Is there any way to demo this service for free, or any plans to in the future? The $10 isn't a big hit or anything, but I, as well as many others, I'm sure prefer to try a service before handing off their credit card info.

craigcannon · on June 4, 2018

Yeah, we'll definitely have something free in the future. :)

For the launch we wanted to make sure we could handle all the first orders quickly.

ocdtrekkie · on June 4, 2018

I think this is super neat. The podcasts I am on do not make $10 an episode, so sadly, I can't justify it. But if I could, I would. I shared it with my podcasting friends.

craigcannon · on June 4, 2018

Thanks!

lf275 · on June 4, 2018

Would you also do YouTube videos? There are many lectures online and it would be nice if more people transcribed them.

craigcannon · on June 4, 2018

Yeah, we could definitely do videos.

jeena · on June 4, 2018

I'm able to upload one WAV file per speaker, like most of the podcasters I would pressume. Would that make it easier to automate the split and this make the service cheaper?

craigcannon · on June 4, 2018

I need to test this out more thoroughly. Is your speaker audio totally clean when they're not talking?

jeena · on June 4, 2018

For all practical reasons I'd say yes, I'm using https://auphonic.com/ which does use machine learning to mute the parts where no human speaks, then they send this audio to third party (like yours) https://auphonic.com/blog/2016/12/02/make-podcasts-searchabl... then they get the files back so auphonic can bundle it for download with the rest.

michaelmior · on June 4, 2018

So the real value add is when you have multiple speakers? Any suggestions on a reliable service that can handle a single speaker?

asdsa5325 · on June 4, 2018

You wouldn't need a "service", you can just run it through a TTS program locally.

michaelmior · on June 4, 2018

True. Although I assume you mean speech recognition and not TTS. Good recommendations there would be great :)

asdsa5325 · on June 4, 2018

Oops, yeah, STT.

roadrunner201 · on June 5, 2018

Can your service handle corrupted files? I record on the fly and I have some great interviews that I cannot play.

genieyclo · on June 4, 2018

Are you using AWS Polly?

craigcannon · on June 4, 2018

Google actually

mdocherty · on June 4, 2018

Hello,

Just curious if you are able to speech-to-text phone calls? Or if any point you plan to do so.

Thanks

wenbin · on June 4, 2018

Great project!

My podcast search engine project Listen Notes ( https://www.listennotes.com/ ) does transcription as well.

It's not as accurate as Podscripter, but good enough for in-audio search. Example: https://www.listennotes.com/e/1dae4f4c2c0d4202a1180bd9c9f17d...

Website visitors can request to transcribe episodes on Listen Notes websites.

craigcannon · on June 4, 2018

Listen Notes is awesome. Just found out about it recently.

tomkinson · on June 4, 2018

To drum up business you should just do all the Joe Rogan podcasts for free.

bmelton · on June 4, 2018

Given the hand-curated speaking order, that seems hard.

There are services out there that will do quality transcription that is completely automated (e.g., Bitplatter's FluidData), and IIRC, they're already doing most of the podcasting world's transcriptions for free right now, including Joe Rogan's.

This seems like the more niche market of those who want last-mile, extra-high-quality transcriptions to sell, for which I think they should be charging more than $10.

nibbleshift · on June 4, 2018

FluidDATA has the Joe Rogan podcasts for free at https://fluiddata.com/search?channel_id=9853

Not to mention, FluidDATA has transcribed over 8.2 million podcast episodes from over 230,000 podcast feeds.

jtbayly · on June 4, 2018

I can't find a single transcript at that website, though it's a cool service, kind of like Google for searching within audio (Podcasts).

nibbleshift · on June 4, 2018

FluidDATA definitely has a different model than PodScript.

FluidDATA doesn't expose the entire transcript. It currently only exposes the ability to search the transcripts of millions of podcasts.

For example, you can find podcasts that talk about SpeechBoard and Craig by searching: "speech board" + "craig canon"

https://fluiddata.com/search?term=%22speech%20board%22%20%2B...

totoglazer · on June 5, 2018

That's a really cool idea. However the search (or maybe the transcript?) is truly terrible. I can't find anything at all, or it's grossly inaccurate.

craigcannon · on June 4, 2018

Only the Joey Diaz episodes :)

pascalxus · on June 4, 2018

I really would love something like this: to transcribe chinese podcasts into pinyin and characters. this would really help me learn the language better, as listening skills are the hardest to learn when learning a foreign language.

rfreytag · on June 4, 2018

Wouldn't you be concerned about transcription errors interfering with your learning?

A lot of movies have Chinese subtitles. Pick an action movie and the dialog is quite easy.

craigcannon · on June 4, 2018

That's a neat idea!

CharlesW · on June 4, 2018

What distinguishes it from popular competitors like Trint, Temi, etc. who also do speaker identification?

craigcannon · on June 4, 2018

We're hoping to provide better speaker detection and an easy workflow for podcasters.

JasonFruit · on June 4, 2018

This brings up a point that has long puzzled me: why is it so uncommon for podcasters to write out what they intend to say? It seems like it would eliminate a lot of the misspeech, circumlocution, and unclearness that make podcasts so frustrating to listen to for me. It would also eliminate the need for transcription after the fact.

ghaff · on June 4, 2018

That might make sense for one person basically reading a script. (Which, with a few exceptions, aren't a very good format.)

But most podcasts are interviews/conversations. You're not going to get most podcast guests to write out full responses in advance.

I do usually review topics and some potential questions for a few minutes with my guest before we get started and do editing if a question or answer goes off the rails or there's an error. I also do some light editing to cut down on umms, you knows, etc. But a lot of casual podcasts created as sidelines wouldn't make sense if they were going to take a week to put together.

bpicolo · on June 4, 2018

A lot of podcasts are adhoc conversations between multiple people

superflyguy · on June 4, 2018

I've never listened to a podcast but does this work on YouTube and Ted talks?

CharlesW · on June 4, 2018

TED talks and YouTube videos already have closed captions/subtitles.

bmelton · on June 4, 2018

YouTube captioning tends to range from bad to horrible. TED talks are probably hand-transcribed, as I remember them as being higher quality.

CharlesW · on June 4, 2018

> YouTube captioning tends to range from bad to horrible.

Yes, and I mentioned YouTube specifically because it's representative of the best machine transcription (which this service is) can offer.

TED talks are indeed transcribed by professionals, and so the quality is a magnitude better than what this service can provide. TEDx talks are transcribed by volunteers, so their quality is more variable.[1]

[1] https://www.ted.com/participate/translate/transcribe

craigcannon · on June 4, 2018

Yeah, they most likely pay for professional transcripts. As do orgs like NPR and bigger companies using podcasts for content marketing.

superflyguy · on June 4, 2018

I don't want that.