Deepfake Myths: Common Misconceptions About Synthetic Media

2019-07-26T11:50:59-04:00
June 14, 2019
Non-Resident Fellow
|

The House Permanent Select Committee on Intelligence convened yesterday to discuss the challenges posed by AI-manipulated media, better known as deepfakes. There is finally some momentum to “do something” about deepfakes, but crucial misconceptions about deepfakes and their effect on our society may complicate efforts to develop a strategic approach to mitigating their negative impacts.

The term deepfake has many connotations. Colloquially the word deepfake is used to describe the use of artificial intelligence techniques to create a fake video of someone saying or doing something that they never said or did, as in the fake videos of Barack Obama demonstrated by Radiolab and BuzzFeed. But this is only the tip of the iceberg of the types of fabrication enabled by new technology. For example, it is now possible to generate seemingly realistic faces of non-existent people—and this capability is already being abused by bad actors.

Experts generally prefer the more general term synthetic media, which refers to the generation or manipulation of media—video, audio, imagery, and potentially text—that would have been difficult and expensive to create prior to recent technological advances. This relies primarily, but not exclusively, on artificial intelligence.

In responsible hands, this technology could eventually be used to create art, seamlessly translate audio and video across languages, or even create realistic, personalized avatars in digital spaces. In the wrong hands, synthetic media could deepen divisions in society—and in government—as it becomes difficult to tell what’s real and what isn’t. Understanding this technology and its implications is critical to developing a thoughtful policy response that will enable its best features and protect us from its worst ones.

Myth 1: “Deepfakes” currently allow people to easily create fake videos of anyone doing anything.

This is not quite accurate. Current technology enables very specific types of operations on video, and only if you have particular kinds of data. One technique is face swapping, in which a target’s face is used to replace a face in an existing video. Of course, this requires a video of someone else doing whatever one wants the target to do, and it currently leads to unrealistic results if the body type and hair of the source and target don’t match. This was the process behind the first widely-known instance of deepfakes, in which Reddit users swapped the faces of celebrities onto the bodies of pornographic actors.

Another operation is called facial reenactment. This means using an actor or existing footage to make facial expressions and having those expressions applied to the video of your unwitting target. Up until fairly recently, most implementations of facial reenactment needed significant footage of the person being targeted, and they still generally require a static background and many trial runs to get decent results. Facial reenactment for the whole body and similar operations for audio are also possible. As with video, the best audio results come from professional readers for which there are many hours of recordings with no background noise. New techniques can also be used to realistically add objects to a video scene, change weather conditions, and even generate fairly realistic text.

This field is rapidly evolving, and these limitations are quickly becoming out of date. But it’s crucial to understand that none of these operations allow you to create a specific video from whole cloth. There is no way (short of a Hollywood style studio) to create a video of Nancy Pelosi or Mitch McConnell doing backflips on an elephant without building on existing, potentially traceable footage. This will likely be true for at least the next few years.

That said, there is still significant harm that can come from operations that are possible with current technology. And as more limitations are overcome, producing a very convincing synthesis with no more than a cell phone will become possible. We need to be ready.

Myth 2: Image editing like Photoshop didn’t cause any harm, so synthetic media won’t either.

Even Adobe Photoshop—and the democratization of image editing more generally—has had significant negative impacts in the hands of malicious actors. In 2017, a photo of American football player Michael Bennet was edited to appear as if he were burning the American flag after he kneeled during the national anthem, and fact checkers around the world spend much of their time addressing such simply edited images. Manipulated static images have been a significant boon to misinformation and hate purveyors around the world.

Synthetic media is very different in potential scale, scope, and psychological impact. Video and audio are often more persuasive and have a bigger impact on memory and emotion. Worse, synthetic media tools could be far easier to use than Photoshop as the technology becomes more accessible. This doesn’t mean that Photoshop should be outlawed, but it’s critical to understand that even the cheap fakes, created by simple image editing tools, can have significant societal impact. The best defenses against deepfakes are also defenses against cheap fakes, such as ensuring that platforms like Facebook and YouTube avoid rewarding any form of “outrage bait” fakery with attention and revenue.

Myth 3: The most significant harm of synthetic media is that people will be tricked by fakes.

There are many potential impacts of synthetic media, both good and bad, and the direct impacts of fakery are only one side of the coin. Perhaps even more worrying is that people are becoming less willing to believe real media. If any video might be the result of manipulation, there is nothing to stop a politician from disavowing a legitimate, but damaging video, for example.

More generally, synthetic media is a challenge to our epistemic capacity—our ability to make sense of the world and make competent decisions. Especially concerning is the growth of reality apathy—where people give on determining real from fake—and reality sharding—where people selectively choose what to believe, forming deeper and deeper like-minded clusters. These are much broader societal issues, and they could be supercharged by a growing ability to manipulate audio and video.

Just like image editing technology, synthetic media technology holds immense promise, from helping us train safe autonomous cars to bringing history to life for a new generation of students. But there are a number of crucial next steps we can take to minimize the negative impacts and maximize the positive ones.

We need to build a solid foundation if we want to preserve the epistemic capacity needed to run our democracy.

Future articles will outline specific recommendations for what policymakers, civil society, and internet companies can do to minimize the potential harms caused by synthetic media.

This piece is based upon the first part of keynote previously presented by the author at a convening for Denmark’s Ministry of Foreign Affairs.

The views expressed in GMF publications and commentary are the views of the author alone.