Emergence of Voice Related Artificial Intelligence

Jeffrey N. Rosenthal

Email

1.215.569.5553

Bio and Articles

Find Your Next Job !

LEGAL ASSISTANT II

Experienced Family Law Attorney

Specialist: Legal Information Center Research

Explore More Job Openings

Say What?! The Rise of AI-Generated Voice Deepfakes

by: Jeffrey N. Rosenthal of Blank Rome LLP - Publications

Wednesday, September 13, 2023

Print Mail Download info_icon_img

/>i

By now most of us are used to interacting with synthetic or artificial voices. Just call a customer service help line or summon a digital personal assistant (like Alexa or Siri) and you would expect to hear a computer-generated voice. But what if the synthetic voice sounded exactly like you? Or worse, was then used to say things you would never say?

Several tech companies are making strides training speech recognition tools to mimic the speaker’s voice. And while this can improve user clarity and accessibility for those with physical limitations, there is another, more troubling trend: the prevalence of “voice deepfakes”—i.e., creating synthetic voices from unknowing (or unwilling) participants using generative artificial intelligence.

In February, two separate reporters successfully tricked their financial institutions’ identity verification software with AI-generated synthetic voices to gain access to accounts. This attracted the attention of the chairman of the Senate Banking Committee, who noted the “prevalence of video clips publicly available on Instagram, TikTok and YouTube have made it easier than ever for bad actors to replicate the voices of other people.” As a result, anyone with vocal recordings online can be targeted for such attacks.

Given the concern over voice deepfakes, there are several measures companies—and individuals—can take to better protect themselves from the real harm caused by fake voices.

A Case of He Said, It Said?

For those unfamiliar with the term, generative AI (in this context) refers to computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making and translation between languages. Such systems are “generative” because they create new output based on discrete inputs or user prompts. There are also several types of outputs, such as a “voice double” or “voice clone”—which is a synthetic, adaptable copy of a person or licensed character’s voice. Other types of outputs consist of a “composite voice”—which is a synthetic voice made up of multiple people’s vocal tracks. Or a “fully synthetic voice,” which is not based on a human being.

Attack of the Clones

Recently, the issue of voice deepfakes has arisen in the context of professional voice actors discovering their vocal tracks were surreptitiously reconstituted using generative-AI. The resultant synthetic voice was then made to say things for which the voice actors were not paid, and about which they may have moral, professional, or personal objections. In one particularly disturbing example, a videogame modder (i.e., someone who independently creates new scenes for a videogame based on existing content) used generative-AI on a public website to create and distribute deepfake pornographic content using the actor’s vocal tracks without her consent. And while the website said it may remove such AI-generated content in response to “a credible complaint,” this response and effort has not stemmed the tide of such misappropriation.

Given the legal landscape, voice actors may have limited options to protect themselves once the harm has occurred. It is well-settled that a voice itself it not copyrightable. About half the states have enacted so-called “right of publicity” laws—which allow a person whose identity has “commercial value” to control to use of that identity. But the protections afforded vary. Further, while the federal Lanham Act creates causes of action for false association, false advertising, and false endorsement, this Act is very nuanced; its application can be challenging.

With greater awareness of these issues, voice actors can at least build in certain preconditions, limitations, or compensation structures for the use of AI in dealing with studios. Actors can also submit takedown demands to offending websites where the synthetic voices are hosted as a first (but important) step. And there are several trade associations—like SAG-AFTRA, and the National Association of Voice Actors—who can aid concerned voice actors. But most people without commercially recognizable voices have even less protection; they would likely have to rely on more traditional tort-based liability, should they find themselves the victim of AI-generated voice deepfakes.

Implications

Despite the rise of voice deepfakes, much can still be said about proactive and preventative steps to identify the problem and mitigate the resultant harm.

For companies using speech recognition tools that may be on the receiving end of an improperly cloned synthetic voice, there are several anti-spoofing measures being developed. For instance, call centers can take steps to mitigate the harms caused by voice deepfakes. This includes educating employees about the danger of deepfakes and allowing callback functions to end a suspicious call. Likewise, multifactor authentication and anti-fraud solutions can also help reduce the risk of a successful voice deepfake attack. Such steps include devising call metadata for identification verification, digital tone analysis, as well as tracking caller key-press behaviors. Internationally, China is working on antifraud regulations. According to The New York Times, China recently unveiled rules requiring manipulated material to bear digital signatures or watermarks and have the subject’s consent. Whether this will work on deepfakes, however, is unclear.

As for the United States, professional voice actors are the “canary in the coal mine” when it comes to creating rules to combat voice deepfakes. But the path to protecting against this growing threat is uncertain. As some commentators have observed, legislation to impose transparency for data input and tracking models used by AI developers would be a positive step. So too would tying online tracking/reporting systems to the creator’s work help with one of the most challenging aspects—identifying the voice clone in the first instance. Few people have the time (or inclination) to perpetually scour the Internet for potential instances of voice misappropriation.

For now, awareness of the issue may be the first line of defense while the law inevitably struggles to keep up with this cutting-edge technology. In the meantime, we would all do well to keep our eyes and, more importantly, our ears, open to AI-generated voice deepfakes.