Craig from YC here. This project is a follow-up to SpeechBoard, which was a text-based audio editor - https://news.ycombinator.com/item?id=15670827. Thanks for all the feedback there :)
We were surprised to find that many users just wanted transcripts, so Podscripter is an attempt to solve that.
Here's how it works: every time you publish an episode (or give us a file) we run it through a speech to text service. Then we split up the speakers by hand, which ends up being a fair bit of work and is why it's 24hrs instead of minutes . Then we email you the transcript.
Before I was podcasting at YC I had my own podcast and couldn't justify paying $1 a minute for transcripts. These machine generated transcripts get you most of the way there for a lot less money :)
I wrote a service, (nowhere close for public release), that segments audio based on speakers. You have to identify one speech segment, it is then capable of labelling others. It uses GMM and MFCC. Is something like this in the works? Cool idea! I consume a fair bit of podcasts, I can affirm that there is definitely a need for this
Do try out https://scribie.com/transcription/free as well once. Our diarisation system is around 90% accurate on longer paragraphs and will be out this week.
For $10, I'd expect above 99% accuracy, especially since I can get an automated transcript for $5 for 60 minutes that presumably has a similar error rate to what you are offering.
Also, I'd expect it sooner than 24 hours, since I can get automated ones back in under an hour.
Not trying to be cold water. I am actually interested, but what sets you apart from the other cheaper automated solutions? Am I wrong about the error rate I can expect elsewhere?
By way of context, human transcriptions cost about $1 to $1.50 per [EDIT: minute] --which is what I use for my podcasts. Accuracy is extremely good, especially if you flag obscure terms when you submit.
For a podcasts with good audio quality(which is usually a case for podcasts) you probably can use speech recognition tools that will cost you a fraction of this price. They are pretty good nowdays
They're really not--at least for my purposes. And $25-50 for an episode requiring minimal cleanup versus spending 30-60 minutes going back through the audio and fixing things up is a no-brainer for me.
ADDED: Machine transcriptions are pretty good for a lot of things such as search and quickly skimming content. But if you want something that people can read as an alternative to listening to the podcast, you pretty much have to use human transcription or budget a bunch of time to fix up.
Is there a way you can record then edit a podcast but keep track of which microphone different voices are coming from? Seems like you could make speaker identification easier that way?
Uggh. Per minute. Sorry. So it costs me about $25-30 per podcast typically. Which is pretty reasonable but obviously adds up for longer transcriptions.
Would this integrate directly with zencastr? They record tracks individually - and then mix them - so you'd have direct access to the individual tracks.
Seems like if you did that or made a way for people to upload their unmixed tracks, you could save some time on the whole thing?
Hey Craig, I appreciate the hard work! Is there any way to demo this service for free, or any plans to in the future? The $10 isn't a big hit or anything, but I, as well as many others, I'm sure prefer to try a service before handing off their credit card info.
I think this is super neat. The podcasts I am on do not make $10 an episode, so sadly, I can't justify it. But if I could, I would. I shared it with my podcasting friends.
I'm able to upload one WAV file per speaker, like most of the podcasters I would pressume. Would that make it easier to automate the split and this make the service cheaper?
For all practical reasons I'd say yes, I'm using https://auphonic.com/ which does use machine learning to mute the parts where no human speaks, then they send this audio to third party (like yours) https://auphonic.com/blog/2016/12/02/make-podcasts-searchabl... then they get the files back so auphonic can bundle it for download with the rest.
Given the hand-curated speaking order, that seems hard.
There are services out there that will do quality transcription that is completely automated (e.g., Bitplatter's FluidData), and IIRC, they're already doing most of the podcasting world's transcriptions for free right now, including Joe Rogan's.
This seems like the more niche market of those who want last-mile, extra-high-quality transcriptions to sell, for which I think they should be charging more than $10.
I really would love something like this: to transcribe chinese podcasts into pinyin and characters. this would really help me learn the language better, as listening skills are the hardest to learn when learning a foreign language.
This brings up a point that has long puzzled me: why is it so uncommon for podcasters to write out what they intend to say? It seems like it would eliminate a lot of the misspeech, circumlocution, and unclearness that make podcasts so frustrating to listen to for me. It would also eliminate the need for transcription after the fact.
That might make sense for one person basically reading a script. (Which, with a few exceptions, aren't a very good format.)
But most podcasts are interviews/conversations. You're not going to get most podcast guests to write out full responses in advance.
I do usually review topics and some potential questions for a few minutes with my guest before we get started and do editing if a question or answer goes off the rails or there's an error. I also do some light editing to cut down on umms, you knows, etc. But a lot of casual podcasts created as sidelines wouldn't make sense if they were going to take a week to put together.
> YouTube captioning tends to range from bad to horrible.
Yes, and I mentioned YouTube specifically because it's representative of the best machine transcription (which this service is) can offer.
TED talks are indeed transcribed by professionals, and so the quality is a magnitude better than what this service can provide. TEDx talks are transcribed by volunteers, so their quality is more variable.[1]
Craig from YC here. This project is a follow-up to SpeechBoard, which was a text-based audio editor - https://news.ycombinator.com/item?id=15670827. Thanks for all the feedback there :)
We were surprised to find that many users just wanted transcripts, so Podscripter is an attempt to solve that.
Here's how it works: every time you publish an episode (or give us a file) we run it through a speech to text service. Then we split up the speakers by hand, which ends up being a fair bit of work and is why it's 24hrs instead of minutes . Then we email you the transcript.
Before I was podcasting at YC I had my own podcast and couldn't justify paying $1 a minute for transcripts. These machine generated transcripts get you most of the way there for a lot less money :)
Let me know what you think!