Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I agree. I think the most telling sign is the hand gesture for the number 3 when he says "just invested 300 bucks". Don't think Deepfake models can understand intent yet.


There's two ways to think of this as a deepfake:

1 - This is an actual video of the guy in his home, but they changed/synthesized the audio and then worked on the lip movements to make it match.

2 - This is a video of an actor impersonating the guy, possibly to the extent of impersonating his voice (although his timbre might make that a little tricky), and then they just deepfake the face on to the actor. An example of this is deeptomcruise on TikTok, something that you should treat yourself to if you haven't seen yet - https://www.tiktok.com/@deeptomcruise (first two today aren't great, here's a good one - https://www.tiktok.com/@deeptomcruise/video/7018171271095553... ? )

I'm not even convinced there's any alteration here, but even if there was both of the above could be possible. Adobe demoed something called VoCo in 2016 that never saw the light of day, not sure if there is something approximating this available today: https://www.youtube.com/watch?v=I3l4XLZ59iw&t=260s


What do you mean? It’s not like a deepfake is a model that produces a new video from scratch in a completely random setting.

It needs a base video to be modified. They need a video of a person talking to do the deepfake.


Exactly, so the point is that any relevant gestures must be coming from the timing used in an existing video, the current deepfake tech can manipulate lips and facial expressions, but it can't have the video lift three fingers at the proper time when "300" is being said.

So this is either an indication of a very elaborate deepfake which managed to surface an amazingly coincidental source video (which should be possible to find on her archives) or that it's not a deepfake but a real recording.


Or: it’s a deepfake where the attacker made a video and attached the victims face in post-processing.


Or: the base video is well chosen among the victim’s, the message is crafted as to not deviate a lot from the base, and the deep fake is used to have him say other things.


I think the issue is that there would be insufficient audio sources to generate new spoken language.


That is not an issue. You can generate new spoken language with not much more than a word.


Sincerely: I’d love to hear an example of that.

The reason for my skepticism: The state of the art language pronunciation from the best this planet has to offer still requires a full phoneme library recorded in studio conditions. A voice sample taken from a user’s instagram page doesn’t seem like the kind of source material that would be useful to make convincing speech.


Wouldn’t it be a real person making the gesture and his face is deepfaked on?


I think you are right, if you find a similar person with similar voice (even commission them 5$ in fiverr) and only deep-fake the face in a low quality video, this is very much achievable in today's state of the art.


Pay attention to the hand. His index finger is turned inwards, that would be a very odd “number 3 gesture”. More likely random hand movement from the video used.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: