Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The issue is Job's training data is likely 99% his public "presentation voice" audio -- cadence, inflection, emphasis from public remarks at Apple events, commencement addresses, shareholder meetings, etc -- which OF COURSE sounds unnatural in regular conversation.

Meanwhile Rogan has million hours of regular conversation audio to learn from.



Not sure if you meant "million hours" as hyperbole; but that'd be about 114 years of non-stop conversation.

If there's ~2000 episodes of his podcast and he's talked in a bunch of other place too, it's probably less than 5000 hours.


one could hire a _really good_ steve jobs voice actor to generate more training data for the AI algorithm?


At that point using them to create the exact audio would be easier


Yeah but what VC is interested in funding /that/?


Humans are expensive though. If you have a lot of speech to record, it might be cheaper to use the human to train the AI and then let the AI finish the rest.


Then you could just hire the actor to read Jobs' part directly?

Hiring people to train their replacements seems off to me.


Then you'd need to hire the actor for every part. After enough training with the actor, you won't need to hire the actor anymore.


ethically questionable, but financially it makes some sense


There's also Respeecher, which lets you realistically "puppet" someone else's voice.


What non-presentation source material do Steve Jobs voice actors train with? Seems like that same source material can be used to train the AI voice.


Would the fact that Joe's data is more standardized and produced the same way. Job's data is likely a mix of different volumes, echo levels, processing have an effect


One million hours = 114.2 years




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: