Authors allege Meta used copyrighted books for AI training despite warnings

Julia Shapero

December 13, 2023 at 11:10 AM·2 min read

Sarah Silverman, Michael Chabon, Ta-Nehisi Coates and other authors accused Meta on Monday of using their copyrighted books to train its artificial intelligence (AI) models despite warnings from the company’s legal team.

The complaint, which consolidates two copyright cases against the parent company of Facebook and Instagram, alleges Meta used an online resource to train its large language models, Llama 1 and Llama 2, that contained the authors’ works without their permission.

Large language models, which can produce human-like responses, require substantial amounts of data for training. In releasing Llama 1, Meta acknowledged it used the Books3 section of The Pile, a publicly available dataset that contains nearly 200,000 books, to train the model.

The authors, whose copyrighted works were included in Books3, allege Meta was aware of potential legal problems with using the dataset, pointing to a series of messages between a Meta AI researcher and researchers affiliated with EleutherAI, the organization that assembled The Pile.

In late 2020, Meta researcher Tim Dettmers expressed interest in using The Pile in a conversation on the EleutherAI public Discord server and asked about “any legal concerns” with using the dataset.

While one researcher with EleutherAI suggested there was “a very strong case for free use,” Dettmers later said Meta’s lawyers “recommended to avoid” using Books3, adding “it seems to be already clear the data cannot be used or models cannot be published if they are trained on that data.”

“At Facebook there are a lot of people interested in working with [T]he [P]ile, including myself, but in its current form, we are unable to use it for legal reasons,” Dettmers added in early 2021, according to Monday’s filing.

However, Meta ultimately used Books3 in its training dataset for Llama 1. The authors also accused the tech giant of using Books3 to train Llama 2, though the company opted not to reveal its training datasets for the latest model “for competitive reasons.”

“This explanation, however, is likely pretextual,” the lawsuit says. “A more plausible explanation for Meta’s decision to conceal its training data is to avoid scrutiny by those whose copyrighted works were copied and ingested during the training process for Llama 2.”

“On information and belief, a key reason Meta chose not to share the training dataset for Llama 2 was to avoid litigation from using copyrighted materials for training that Meta had previously determined to be legally problematic,” it continues.

For the latest news, weather, sports, and streaming video, head to The Hill.

Yahoo Sports
Dolphins owner Stephen Ross reportedly declined $10 billion for team, stadium and F1 race
The value of the Dolphins and Formula One racing is enormous.
Yahoo Sports
Benches clear as Red Sox's Chris Martin takes exception to Brewers bunting on him
An argument between Boston Red Sox reliever Chris Martin and Milwaukee Brewers first base coach Quintin Berry caused a bench-clearing confrontation at Fenway Park on Sunday.
Yahoo Finance
Gen X is the 401(k) 'experiment generation.' Here's how that's playing out.
Nearly half of Gen Xers say their retirement savings are behind schedule, according to a new survey.
Yahoo Sports
Charges against Scottie Scheffler dropped in police incident during PGA Championship
The World No. 1 is free of all charges stemming from a confrontation outside Valhalla Golf Club on May 17.
Autoblog
Rhode Island asking kei car owners to turn in their registration
Rhode Island is making it illegal to register a kei car, and it's asking enthusiasts who already have one to turn in their registration.
Yahoo Sports
Lashinda Demus will get her Olympic gold medal ... 12 years later
Demus will receive her gold medal at a ceremony at the foot of the Eiffel Tower during the 2024 Summer Olympics.
Autoblog
2025 Ford Mustang GTD reveals the view into its ultra-cool 'suspension window'
Today, Ford revealed that the GTD will feature something it’s calling the “suspension window."
Yahoo Finance
US consumers show the Fed its backward problem with high rates: Morning Brief
Consumer confidence rebounded in May, but there are signs this data is being driven by wealthier consumers enjoying the spoils of high rates, presenting a new challenge for the Fed.
Yahoo Sports
Fantasy Baseball Trade Analyzer: Why you should try pulling off a deal for an elite catcher
Fred Zinkie examines the trade landscape, revealing players fantasy managers should try to target or get rid of in deals.
Yahoo Finance
Nvidia stock leaps to latest record — thanks to Elon Musk
The increase of AI investment has continued to boost optimism over Nvidia's growth as the chipmaker continues its record-setting stock rally.
Yahoo Sports
Lexi Thompson, 29, set to retire after the 2024 LPGA season
Thompson will be competing in her 18th straight U.S. Women's Open later this week.
Autoblog
2025 Ford Expedition spy photos reveal major exterior overhaul
2025 Ford Expedition full-size SUV spy photos show a significantly changed exterior that takes cues from the recently updated F-150 pickup truck.
Yahoo Sports
Umpire Ángel Hernández, after long and controversial run in Major League Baseball, set to retire
Ángel Hernández, by both fans and players alike, has long been considered one of the most hated umpires in Major League Baseball.
Yahoo Sports
Caitlin Clark outplays Cameron Brink for career-high 30, but red-hot Sparks overwhelm Fever from long distance
Turnovers plagued Clark and the Fever again while the Sparks put on a clinic from beyond the 3-point arc.
Yahoo Sports
Dodgers snap longest losing streak in 5 years aided by late Mets blunders
The New York Mets were the cure for the ailing Los Angeles Dodgers.
Yahoo Sports
Ex-Jaguars kicker Brandon McManus accused of sexual assault in lawsuit after alleged incident on team flight
Brandon McManus allegedly sexually assaulted two flight attendants on the team's charter flight to London for a game last season.
Yahoo Sports
Paul Skenes' sensational MLB start continues with 9 strikeouts in Pirates' win over Tigers
Skenes now has 30 strikeouts in 22 MLB innings.
Yahoo Sports
5 things to know from the weekend in MLB: Here's how Braves are going to cope after Ronald Acuña's heartbreaking injury
A league without a fully operational Acuña is a less interesting, less enjoyable league. His absence will be loud.
Yahoo Personal Finance
Mortgage rates today, May 28, 2024: It could be a good time to buy
These are today's mortgage rates. Home inventory is up, and mortgage rates are down. This could be a good time to start house shopping. Lock in your rate today.
Yahoo Finance
Some Americans live in a parallel economy where everything is terrible
Many Americans mistakenly think the economy is shrinking and the stock market is tanking. What gives?

News

Life

Entertainment

Finance

Sports

New on Yahoo

Authors allege Meta used copyrighted books for AI training despite warnings

Recommended Stories

Dolphins owner Stephen Ross reportedly declined $10 billion for team, stadium and F1 race

Benches clear as Red Sox's Chris Martin takes exception to Brewers bunting on him

Gen X is the 401(k) 'experiment generation.' Here's how that's playing out.

Charges against Scottie Scheffler dropped in police incident during PGA Championship

Rhode Island asking kei car owners to turn in their registration

Lashinda Demus will get her Olympic gold medal ... 12 years later

2025 Ford Mustang GTD reveals the view into its ultra-cool 'suspension window'

US consumers show the Fed its backward problem with high rates: Morning Brief

Fantasy Baseball Trade Analyzer: Why you should try pulling off a deal for an elite catcher

Nvidia stock leaps to latest record — thanks to Elon Musk

Lexi Thompson, 29, set to retire after the 2024 LPGA season

2025 Ford Expedition spy photos reveal major exterior overhaul

Umpire Ángel Hernández, after long and controversial run in Major League Baseball, set to retire

Caitlin Clark outplays Cameron Brink for career-high 30, but red-hot Sparks overwhelm Fever from long distance

Dodgers snap longest losing streak in 5 years aided by late Mets blunders

Ex-Jaguars kicker Brandon McManus accused of sexual assault in lawsuit after alleged incident on team flight

Paul Skenes' sensational MLB start continues with 9 strikeouts in Pirates' win over Tigers

5 things to know from the weekend in MLB: Here's how Braves are going to cope after Ronald Acuña's heartbreaking injury

Mortgage rates today, May 28, 2024: It could be a good time to buy

Some Americans live in a parallel economy where everything is terrible