Production studios and fans alike are turning to generative AI tools to make voice actors say things they never said — and their jobs are on the line.
oice actor Allegra Clark was scrolling through TikTok when she came across a video featuring Beidou, a swashbuckling ship captain from the video game Genshin Impact whom she’d voiced. But Beidou was participating in a sexually suggestive scene and said things that Clark had never recorded, even though the rugged voice sounded exactly like hers. The video’s creator had taken Clark’s voice and cloned it using a generative AI tool called ElevenLabs, and from there, they made her say whatever they wanted.
Clark, who has voiced more than 100 video game characters and dozens of commercials, said she interpreted the video as a joke, but was concerned her client might see it and think she had participated in it — which could be a violation of her contract, she said.
“Not only can this get us into a lot of trouble if people think we said [these things], but it’s also, frankly, very violating to hear yourself speak when it isn’t really you,” she wrote in an email to ElevenLabs that was reviewed by Forbes. She asked the startup to take down the uploaded audio clip and prevent future cloning of her voice, but the company said it hadn’t determined that the clip was made with its technology. It said it would only take immediate action if the clip was “hate speech or defamatory,” and stated it wasn’t responsible for any violation of copyright. The company never followed up or took any action.
“It sucks that we have no personal ownership of our voices. All we can do is kind of wag our finger at the situation,” Clark told Forbes.
Loading...
In response to questions about Clark’s experience, ElevenLabs cofounder and CEO Mati Staniszewski told Forbes in an email that its users need the “explicit consent” of the person whose voice they are cloning if the content created could be “damaging or libelous.” Months after Clark’s experience, the company launched a “voice captcha” tool that requires people to record a randomly generated word and that voice must match the voice they are trying to clone.
The company, which is valued at about $100 million and backed by Andreessen Horowitz and Google DeepMind cofounder Mustafa Suleyman, is one of the hottest voice AI companies right now. Its technology only requires between 30 seconds to 10 minutes worth of audio to create what sounds like a near-identical replica of someone’s voice. Along with sites like FakeYou and Voice AI, which offer a free library of digital voices, it’s also at the center of generative AI’s impact on voice actors.
“There’s no legal protection for voice like there is for your face or for your fingerprint.”Jennifer Roberts, voice actor
Interviews with 10 voice actors revealed an already precarious industry on the brink of widespread change as employers begin to experiment with these text-to-speech tools. One voice actor Forbes spoke to said an employer told her she would not be hired to finish narrating a series of audiobooks the day after it announced a partnership with ElevenLabs, leading her to fear she would be replaced with AI. Another said her employer told her that they wanted to use ElevenLabs’ AI to speed up retake sessions, a standard part of recording audio that voice actors are paid for. When she told her employer she didn’t consent to her voice being uploaded to any AI site, the employer agreed, but she said she hasn’t been called in to do any retakes.
The community of voice actors first noticed an influx of AI generated voices after Apple Books launched digital narration of audiobooks with a suite of soprano and baritone voices in January 2023, said Tim Friedlander, the president of NAVA. Actors began discovering thousands of audio files of familiar voices being uploaded to various sites mostly by fans, he said. Most recently, famed actor Stephen Fry said that his voice was scraped from his narration of the Harry Potter books and cloned using AI. In a talk at CogX festival, Fry said the experience “shocked” him.
In a public spreadsheet, hundreds of voice actors have requested to have their voices purged from AI voice generators Uberduck and FakeYou.ai, which have said they will take voices down from their sites if the voice’s owner reaches out. While FakeYou.ai still provides thousands of popular voices like those of John Cena and Kanye West that anyone can use, Uberduck removed user-contributed voices from its platform in July. Uberduck and FakeYou.ai didn’t respond to multiple requests for comment.
One of the voice actors who has publicly requested his voice be removed from voice generators is Jim Cummings, the voice behind characters like Winnie-the-Pooh and Taz from Looney Tunes. He told Forbes he would only agree to users templating his voice if he and his family received royalties for it. “Keep your paws off my voice,” he said.
A Legal Dilemma
Like striking film actors, who are sounding the alarm about the coming of AI and how it could affect their jobs, voice actors are on the front lines of technological change. But unlike other creative fields, where authors and artists are banding together in class-action lawsuits to push back against their copyrighted work being used to train AI models, voice actors are uniquely vulnerable. Even though voices are inherently distinguishable, they aren’t protected as intellectual property. “There’s no legal protection for voice like there is for your face or for your fingerprint,” said Jennifer Roberts, the voice behind several video game characters. “Our hands are tied.”
A recording of a voice, however, can be copyrighted, and according to Jeanne Hamburg, an attorney for law firm Norris McLaughen, using a voice for commercial purposes can be protected by “rights of publicity,” which prevents celebrities’ likenesses from being exploited. That’s in theory, though: Most contracts signed by voice actors don’t stop recordings of their voices from being used to train AI systems. For more than a decade, contracts have stated that producers “own the recording in perpetuity, throughout the known universe, in any technology currently existing or to be developed,” said Cissy Jones, a voice actor who is part of the founding team at National Association of Voice Actors (NAVA), a newly formed union for voice actors.
Those contracts were largely written and signed before the advent of AI systems. “Voice actors have not provided informed consent to the future use of an audio recording and haven’t been fairly compensated for it,” said Scott Mortman, an attorney for NAVA. “And so protections need to be strengthened significantly in the wake of AI.”
That’s why NAVA, and the actors union SAG-AFTRA, are working to strike verbiage from contracts that allows employers to use an actor’s voice to create a “digital double,” or “synthesize” their voice through machine learning. The organizations have also developed new boilerplate language to add into contracts that would protect voice actors from losing the rights to their voices.
A Myriad of Misuses
Like Clark, numerous voice actors have experienced fans manipulating their voices using generative AI tools to create pornographic, racist and violent content. Even when fans use AI voices to create innocuous memes or other kinds of fan content, voice actors have spoken up on social media, forbidding people from fabricating their voices.
NAVA member Jones, whose voice has been featured in Disney shows and Netflix documentaries, found TikTok videos in which fans had used Uberduck to create clones of her voice saying inappropriate things. “Not only is my voice saying something I would never say, that stuff is out there in the world,” Jones told Forbes. “If potential buyers hear our voices saying that, how will that affect my future work?” After she reached out, Uberduck removed her voice from the platform, Jones said.
AI generated voices have also become a new medium for harassment. Abbey Veffer, whose voice has been featured in games like Genshin Impact and The Elder Scrolls, said she was doxed by someone who had created a clone of her voice in February. The person created a Twitter account with her address as the username, generated an AI clone of Veffer’s voice and then made the clone say racist and violent things. The anonymous user directly messaged the recording to Veffer and pinned it at the top of the Twitter account. They claimed to have used ElevenLabs’ technology. The experience, Veffer told Forbes, was “intense” and “very upsetting.”.
But when Veffer reached out to ElevenLabs with her concerns, the company said that the clone was not created using its software and was part of an “organized smear campaign” against the startup, according to messages reviewed by Forbes. Three days after Veffer reached out to Twitter, the account was suspended and the video was taken down but her residential address remained on the site for three months, she said.
“Controlling how our voice gets used and where it gets used is very important to us.”Tim Friedlander, president of National Association of Voice Actors
After ElevenLabs rolled out the beta version of its text-to-speech AI tool in January, the startup announced that it was struggling with people misusing its technology. A day later, Vice’s Motherboard found that anonymous 4Chan posters used ElevenLabs’ then free cloning tool to generate racist, transphobic and violent remarks with the voices of celebrities like Joe Rogan and Emma Watson.
AI’s ability to closely mimic people’s voices has also created opportunities for scammers. The FTC has issued warnings this year that criminals are using AI voice clones to impersonate loved ones as a way to convince their targets to send them money. One journalist was able to use ElevenLabs’ tool to create an AI-generated version of his voice that successfully logged into his own bank account.
ElevenLabs did not comment on any of these specific instances, but CEO Staniszewski said in an email, “If someone is using our tool to clone voices for which they don’t have permission and which contravene fair use cases, we will ban the account and prevent new accounts from being set up with the same details.” Along with a “voice captcha” tool to ensure people have that permission, the company says it has also developed an AI speech classifier that can detect with more than 90% accuracy whether an audio clip that contains AI was made using its tools.
Consent And Control
In response to misuse, voice generation sites are adding restrictive measures to police their technologies. Speechify, which licenses the voices of celebrity narrators like Snoop Dog and Gwyneth Paltrow (with full permission), doesn’t allow people to upload content to create customized voices without the active participation of the person whose voice they want to use. Similar to ElevenLabs, it presents a unique text that the user, or someone who is physically present with them, has to read aloud in their own voice. “I think it is short sighted to take shortcuts and my goal is to put content owners in the driver’s seat,” said founder Cliff Weitzman, who first started Speechify to convert his textbooks into audiobooks using machine learning in 2012.
And at Resemble AI, which touts enterprise customers like Netflix and World Bank Group, people can only create a customized AI-generated voice after recording a consent statement in the voice they want to generate. Resemble AI founder and CEO Zohaib Ahmed said that implementing safe ways to deploy the technology has been integral to his startup because he believes the onus of preventing misuse should fall on the vendors making the tools rather than on the end user.
“It sucks that we have no personal ownership of our voices.”Allegra Clark, voice actor
These kinds of verification checks, however, don’t address higher level ethical questions around consent. Actors, for instance, don’t really have control over how their voices will be used posthumously. Voice actors were enraged when gaming studio Hi-Rez Studios added a clause that would allow it to clone a voice using AI after the owner of the voice died (the clause was removed after the uproar). “If an actor passes away, it’s better to replace them with another human than create some artificial performance because it’s not them and it doesn’t bring them back,” said voice actor Clark.
The big concern hovering over all of this is whether there’s a future for voice actors. With employers and fans turning to synthesized voices, many are concerned about finding their next gig or keeping the ones they have. “Controlling how our voice gets used and where it gets used, and how much we’re paid for that usage is very important to us,” said NAVA’s Friedlander.
Loading...