Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm surprised how real Rogan sounds and how Jobs does not. Why is that?


The issue is Job's training data is likely 99% his public "presentation voice" audio -- cadence, inflection, emphasis from public remarks at Apple events, commencement addresses, shareholder meetings, etc -- which OF COURSE sounds unnatural in regular conversation.

Meanwhile Rogan has million hours of regular conversation audio to learn from.


Not sure if you meant "million hours" as hyperbole; but that'd be about 114 years of non-stop conversation.

If there's ~2000 episodes of his podcast and he's talked in a bunch of other place too, it's probably less than 5000 hours.


one could hire a _really good_ steve jobs voice actor to generate more training data for the AI algorithm?


At that point using them to create the exact audio would be easier


Yeah but what VC is interested in funding /that/?


Humans are expensive though. If you have a lot of speech to record, it might be cheaper to use the human to train the AI and then let the AI finish the rest.


Then you could just hire the actor to read Jobs' part directly?

Hiring people to train their replacements seems off to me.


Then you'd need to hire the actor for every part. After enough training with the actor, you won't need to hire the actor anymore.


ethically questionable, but financially it makes some sense


There's also Respeecher, which lets you realistically "puppet" someone else's voice.


What non-presentation source material do Steve Jobs voice actors train with? Seems like that same source material can be used to train the AI voice.


Would the fact that Joe's data is more standardized and produced the same way. Job's data is likely a mix of different volumes, echo levels, processing have an effect


One million hours = 114.2 years


Probably significantly more training data for Rogan than Jobs and a much wider range thanks to his long running pod cast. I am not super familiar with Steve Jobs so I can't think of anything other than his keynotes and some interviews that you would be able to use for him.

Unrelated point...that laugh was incredibly bad and repetitive to the point it felt like they were playing laugh.wav file each time they wanted a laugh instead of generating a new laugh of variable pitch and length.


Exponentially more training data for Joe than Steve, and infinity more training data in a podcast episode setting.


Maybe there's more training data available for Rogan. The guy pumps out hundreds of hours of content a year in which he's recorded discussing every topic under the sun. I can't imagine there's a similar quantity of recordings of Jobs's voice - or of almost anyone's voice for that matter.

Edit: four other people replied in the time it took me to type two sentences. I guess the answer is that obvious.


Probably training set size. Joe Rogan talks for a living.


Presumably because we have hours, days, weeks of Joe Rogan speaking - not just on his podcast but as a sports announcer as well. Steve Jobs... we have a few speeches and presentations, but we don't have much data on how he spoke by comparison.


More data to train on?


lol, everyone calls it training data. Here I was thinking I was in the right practice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: