I agree. I think the most telling sign is the hand gesture for the number 3 when...

jcims · on Nov 28, 2021

There's two ways to think of this as a deepfake:

1 - This is an actual video of the guy in his home, but they changed/synthesized the audio and then worked on the lip movements to make it match.

2 - This is a video of an actor impersonating the guy, possibly to the extent of impersonating his voice (although his timbre might make that a little tricky), and then they just deepfake the face on to the actor. An example of this is deeptomcruise on TikTok, something that you should treat yourself to if you haven't seen yet - https://www.tiktok.com/@deeptomcruise (first two today aren't great, here's a good one - https://www.tiktok.com/@deeptomcruise/video/7018171271095553... ? )

I'm not even convinced there's any alteration here, but even if there was both of the above could be possible. Adobe demoed something called VoCo in 2016 that never saw the light of day, not sure if there is something approximating this available today: https://www.youtube.com/watch?v=I3l4XLZ59iw&t=260s

whoisjuan · on Nov 28, 2021

What do you mean? It’s not like a deepfake is a model that produces a new video from scratch in a completely random setting.

It needs a base video to be modified. They need a video of a person talking to do the deepfake.

PeterisP · on Nov 28, 2021

Exactly, so the point is that any relevant gestures must be coming from the timing used in an existing video, the current deepfake tech can manipulate lips and facial expressions, but it can't have the video lift three fingers at the proper time when "300" is being said.

So this is either an indication of a very elaborate deepfake which managed to surface an amazingly coincidental source video (which should be possible to find on her archives) or that it's not a deepfake but a real recording.

jhardy54 · on Nov 28, 2021

Or: it’s a deepfake where the attacker made a video and attached the victims face in post-processing.

harperlee · on Nov 28, 2021

Or: the base video is well chosen among the victim’s, the message is crafted as to not deviate a lot from the base, and the deep fake is used to have him say other things.

quitit · on Nov 28, 2021

I think the issue is that there would be insufficient audio sources to generate new spoken language.

pyinstallwoes · on Nov 30, 2021

That is not an issue. You can generate new spoken language with not much more than a word.

quitit · on Nov 30, 2021

Sincerely: I’d love to hear an example of that.

The reason for my skepticism: The state of the art language pronunciation from the best this planet has to offer still requires a full phoneme library recorded in studio conditions. A voice sample taken from a user’s instagram page doesn’t seem like the kind of source material that would be useful to make convincing speech.

idontwantthis · on Nov 28, 2021

Wouldn’t it be a real person making the gesture and his face is deepfaked on?

yonixw · on Nov 28, 2021

I think you are right, if you find a similar person with similar voice (even commission them 5$ in fiverr) and only deep-fake the face in a low quality video, this is very much achievable in today's state of the art.

ricardobeat · on Nov 28, 2021

Pay attention to the hand. His index finger is turned inwards, that would be a very odd “number 3 gesture”. More likely random hand movement from the video used.