I'm surprised how real Rogan sounds and how Jobs does not. Why is that?

corry · on Oct 11, 2022

The issue is Job's training data is likely 99% his public "presentation voice" audio -- cadence, inflection, emphasis from public remarks at Apple events, commencement addresses, shareholder meetings, etc -- which OF COURSE sounds unnatural in regular conversation.

Meanwhile Rogan has million hours of regular conversation audio to learn from.

jenny91 · on Oct 11, 2022

Not sure if you meant "million hours" as hyperbole; but that'd be about 114 years of non-stop conversation.

If there's ~2000 episodes of his podcast and he's talked in a bunch of other place too, it's probably less than 5000 hours.

abledon · on Oct 11, 2022

one could hire a _really good_ steve jobs voice actor to generate more training data for the AI algorithm?

ipaddr · on Oct 11, 2022

At that point using them to create the exact audio would be easier

bongobingo1 · on Oct 11, 2022

Yeah but what VC is interested in funding /that/?

olalonde · on Oct 11, 2022

Humans are expensive though. If you have a lot of speech to record, it might be cheaper to use the human to train the AI and then let the AI finish the rest.

flir · on Oct 11, 2022

Then you could just hire the actor to read Jobs' part directly?

Hiring people to train their replacements seems off to me.

sumnole · on Oct 11, 2022

Then you'd need to hire the actor for every part. After enough training with the actor, you won't need to hire the actor anymore.

permo-w · on Oct 11, 2022

ethically questionable, but financially it makes some sense

brushfoot · on Oct 11, 2022

There's also Respeecher, which lets you realistically "puppet" someone else's voice.

cpeterso · on Oct 11, 2022

What non-presentation source material do Steve Jobs voice actors train with? Seems like that same source material can be used to train the AI voice.

vanattab · on Oct 11, 2022

Would the fact that Joe's data is more standardized and produced the same way. Job's data is likely a mix of different volumes, echo levels, processing have an effect

EA · on Oct 11, 2022

One million hours = 114.2 years

the_lonely_road · on Oct 11, 2022

Probably significantly more training data for Rogan than Jobs and a much wider range thanks to his long running pod cast. I am not super familiar with Steve Jobs so I can't think of anything other than his keynotes and some interviews that you would be able to use for him.

Unrelated point...that laugh was incredibly bad and repetitive to the point it felt like they were playing laugh.wav file each time they wanted a laugh instead of generating a new laugh of variable pitch and length.

blairbeckwith · on Oct 11, 2022

Exponentially more training data for Joe than Steve, and infinity more training data in a podcast episode setting.

whoooooo123 · on Oct 11, 2022

Maybe there's more training data available for Rogan. The guy pumps out hundreds of hours of content a year in which he's recorded discussing every topic under the sun. I can't imagine there's a similar quantity of recordings of Jobs's voice - or of almost anyone's voice for that matter.

Edit: four other people replied in the time it took me to type two sentences. I guess the answer is that obvious.

daveguy · on Oct 11, 2022

Probably training set size. Joe Rogan talks for a living.

gjsman-1000 · on Oct 11, 2022

Presumably because we have hours, days, weeks of Joe Rogan speaking - not just on his podcast but as a sports announcer as well. Steve Jobs... we have a few speeches and presentations, but we don't have much data on how he spoke by comparison.

_k9eq · on Oct 11, 2022

More data to train on?

nixpulvis · on Oct 11, 2022

lol, everyone calls it training data. Here I was thinking I was in the right practice.