Adobe has announced a so-called revolution in audio editing which, according to Adobe’s presentation, allows the operator to automatedly analyze a speaker’s voice and resynthesize it to say any arbitrary thing.
Here is, once more, the link to the presentation, including a presentation on video:
The speaker proclaimed that Adobe was making sure to have built-in measures against abuse, but if this technology is possible, we can be sure the government and probably many media outlets already have it and have none of the built-in watermarking to protect against abuse, although I am sure the public version of the software will have such “safe-guards” built in.
According to the presenter in the linked video, the piece of software needs no more than 20 minutes of a speaker’s voice to create convincing results.
So, know that from this point onward (and very likely for the last few years) you can no longer fully trust audio recordings of any kind.
Now, it’s certainly much harder to do such a thing with a running video where you can see the speaker’s mouth, but Hollywood has been creating creative translations for dubbed movies for many years and unless you trust yourself to be reliably able to pick up on slight unsynchronicities in videos or can read lips (I can’t), better have a healthy sense of distrust from now on and ideally retrospectively for a few years. Your distrust should be even higher if the video or audio is of poor quality, because the loss of quality due to heavy compression can be easily used to obfuscate otherwise apparent glitches that you may at least subconsciously pick up on.
Also, streamed video online tends to have a form of ‘lags’ where the video stands still for a few (micro-)seconds. This is perfectly natural, but be aware that it can be very easily faked to hide audio editing as presented by Adobe.
So unless you have very high quality video and audio without lags or compression artifacts and have a good instinct for lip-reading or something comparable, yeah. But even then, you would probably not pick up on it (not entirely sure here); I saw a psychological experiment video where different similar sounding vocals or vowels have been dubbed to a video of a moving mouth. As soon as the ear hears the sound, it seems to ‘correct’ the visual feed from the eye and you don’t see anything wrong with it, as it’s filtered out by the brain. I think it was some well-known psychological effect, but I can’t remember what it’s called.
If you own a Youtube feed or podcast with many published samples, be prepared for the possibility that you may one day be confronted with a recording of something you supposedly said and it will sound pretty perfect and you may start to question your own sanity and memory, Gaslight-style. Although you could probably argue that the intelligence agencies have a few telephone recordings of yourself already in their stock, which would only be mitigated by relatively poor telephone audio quality, but who knows if telephone audio is truly as bad in the source as it is for the recipient. Any engineers here to comment on that with solid tech knowledge?
Just a heads up. This will come and happen and it likely has happened many times already, as I highly doubt this technology is as new as we are meant to believe.