Meta to Help People Craft More Zuck Deepfakes With ‘Voicebox’ AI

CEO of Facebook Mark Zuckerberg appears on a monitor as he testifies remotely during the Senate Commerce, Science, and Transportation Committee hearing 'Does Section 230's Sweeping Immunity Enable Big Tech Bad Behavior?', on Capitol Hill, October 28, 2020 in Washington, DC.
CEO of Facebook Mark Zuckerberg appears on a monitor as he testifies remotely during the Senate Commerce, Science, and Transportation Committee hearing 'Does Section 230's Sweeping Immunity Enable Big Tech Bad Behavior?', on Capitol Hill, October 28, 2020 in Washington, DC.


This screen photo from 2020 is from a real Mark Zuckerberg, we swear... we hope.

Meta has another new AI model on the docket, and this one seems perfectly engineered for the land of tomorrow if that utopian future is filled with nothing but deepfakes and modified audio. Like AI image generators, Voicebox generates synthetic voices based on a simple text prompt from scratch—or, in actuality—sound from thousands of audiobooks.

On Friday, Meta announced its new Voicebox AI that can create voice clips using simple text prompts. In a video, CEO Mark Zuckerberg shared on his Facebook and Instagram, he said the Voicebox AI model can take a text prompt and read it in a variety of human, though somewhat digital-sounding, voices. Otherwise, Voicebox can also modify audio to remove unwanted noises from voice clips, like a dog barking in the background. Unlike many other AI voice synthesization models, Meta’s AI can create audio in languages other than English, including French, Spanish, German, Polish, and Portuguese, and the company said the AI can effectively translate any passage from one language to another, while keeping the same voice style.

Read more

fb-1233278873847469

According to Meta, Voicebox can take an audio sample as short as two seconds long and then match that audio style for text-to-speech generation. If true, it’s more sophisticated than other synthesization models like Speechify or ElevenLabs, which normally require a fair bit more data before they can generate a quality synthetic voice.

In Meta’s promotional clip, one of the voices being modified does sound uncannily like Zuckerberg himself. Depending on how capable the model truly is, hearing Zuck does bring to mind some of the deepfakes modeled after the Meta CEO.

Unlike the company’s many other AI releases as of late, Voicebox isn’t going open source upon its debut, all of which brings to mind that Meta could be restricting its latest AI release because of potential harms that could result. While some folks online have used similar programs to craft synthesized voice clips of their favorite characters in media for fun, others have used them in harassment campaigns against the voice actors themselves. So it could be trying to prevent harm or it could be saving this potentially lucrative model for some future enterprise.

According to the Voicebox research paper, the system was trained on more than 50,000 hours of unfiltered, unenhanced speech from English audiobooks and another 60,000 hours of listening from multilingual audiobooks. That’s why in Meta’s video, the synthetic speech sounds less conversational, and more like somebody reading a child a bedtime story. The researchers said they would eventually scale the model to include more casual speech.

The model is also limited in that users cannot independently control what kind of voice the AI apes and the emotionality of a different speech sample.

But what is most concerning is that Meta doesn’t seem to address the elephant in the room with its latest paper. The researchers did not say which audiobooks were used to train the AI, and where they came from. It’s unclear if the tens of thousands of hours of audiobooks would be equivalent to many thousands of audiobooks.

Gizmodo reached out to Meta for more information about which audiobooks were used in the training data. A Meta spokesperson said they were “public domain” audiobooks, though the company declined to articulate where the company downloaded these books.

Voice actors have not been especially happy with the proliferation of AI, and are especially concerned about contracts allowing for companies to synthesize their voices without compensation. Apple has already taken heat for quietly launching a series of books narrated by AI-generated voices. The tech giant has reportedly approached several major audiobook publishers to create these new AI-narrated stories.

Considering how the audiobook market revenue has been growing by double digits year after year, and the way creative industries are salivating at reducing labor costs, this latest model could prove yet another headache for voice professionals.


Want to know more about AI, chatbots, and the future of machine learning? Check out our full coverage of artificial intelligence, or browse our guides to The Best Free AI Art Generators, The Best ChatGPT Alternatives, and Everything We Know About OpenAI’s ChatGPT.

More from Gizmodo

Sign up for Gizmodo's Newsletter. For the latest news, Facebook, Twitter and Instagram.

Click here to read the full article.