170,000-plus books used to train AI; authors say they weren’t asked

Lois M. Collins

October 9, 2023 at 5:00 PM·4 min read

An investigation by The Atlantic indicated thousands of e-books are being used to train an artificial intelligence system called Books3. | Adobe Stock

Oops!
Something went wrong.
Please try again later.
Oops!
Something went wrong.
Please try again later.

Authors are upset after tech companies started using their books to train artificial intelligence without letting them know or seeking their permission. They worry about copyright infringement and loss of income, among other issues.

Per CNN, “The system is called Books3, and according to an investigation by The Atlantic, the data set is based on a collection of pirated e-books spanning all genres, from erotic fiction to prose poetry. Books help generative AI systems with learning how to communicate information.”

“The future promised by AI is written with stolen words,” The Atlantic article said.

The article notes that some of the text that’s training AI on how to use language is taken from Wikipedia and other online entries. But “high-quality generative AI requires higher-quality input than is usually found on the internet — that is, it requires the kind found in books.”

Many authors apparently don’t view the use of their books to train artificial intelligence as an honor. Rather, it’s a shortcut that robs them of their due, they say.

CNN reported that Nora Roberts, who writes romantic novels, has 206 books in the database — “second only to William Shakespeare.” She told CNN the database is “all kinds of wrong. We are human beings, we are writers and we are being exploited by people who want to use our work, again without permission or compensation, to ‘write’ books, scripts, essays because it’s cheap and easy,” she said in a statement to CNN.

Per The Atlantic, Sarah Silverman, Richard Kadrey and Christopher Golden filed a lawsuit in California that claims Meta — owner of Facebook — violated their copyrights by using their books to train the company’s large language model LLaMA. That’s an algorithm that competes with OpenAI’s GPT-4 to create its own text by using word patterns it learned from the books and other sources, the article said.

The Atlantic’s Alex Reisner created a stir when he got a list of the books and published a searchable database so that anyone can see if their favorite author’s work is being used to teach AI communication skills. He notes the authors include well-known names like Stephen King, John Kratz and James Patterson, among others. The books apparently came through web-crawling technology that found bootleg PDF copies of the books for free online and they were then packaged into a database called Books3, where different AI companies are using them. Bloomberg said it will not use Books3 in the future as it trains its BloombergGPT.

The Authors Guild on Sept. 27 published a guide on actions authors can take if they learned their books are in the Books3 dataset. “This can be an unsettling revelation, raising concerns about copyright, compensation and the future implications of AI,” the article said.

The guild and 17 authors filed a different class-action suit in New York against OpenAI for copyright infringement. Those authors, per a separate guild article, include David Baldacci, Mary Bly, Michael Connelly, John Grisham, Jodi Picoult, Scott Turow and Rachel Vail, among others.

“The complaint draws attention to the fact that the plaintiffs’ books were downloaded from pirate ebook repositories and then copied into the fabric of GPT 3.5 and GPT 4 which power ChatGPT and thousands of applications and enterprise uses — from which OpenAI expects to earn many billions, the article said.

Reisner also wrote that while Meta is using authors’ books without permission, it employed a “takedown” order against at least one developer who used LLaMA coding after it was leaked a few months ago, on the claim that “no one is authorized to exhibit, reproduce, transmit or otherwise distribute Meta Properties without the express written permission of Meta.” And once it decided to make LLaMA open-source, Meta still requires developers to get a license in order to use it.

Not everyone’s upset, however, by use of their work to train AI. Ian Bogost, author of “Play Anything: The Pleasure of Limits, the Uses of Boredom and the Secret of Games,” among other works, wrote a column for The Atlantic titled “My Books Were Used to Train Meta’s Generative AI. Good.” And he promised “It can have my next one, too.”

Bogost contends that successful art “exceeds its creator’s plans,” noting that an author cannot accurately predict a book’s audience. “Who am I to say what my work is good for, how it might benefit someone — even a near-trillion-dollar company? To bemoan this one unexpected use for my writing is to undermine all of the other unexpected uses for it. Speaking as a writer, that makes me feel bad.”

Yahoo Sports
Former NBA guard Darius Morris dies at 33
Former NBA guard Darius Morris has died at the age of 33. He played for five teams during his four NBA seasons. Morris played college basketball at Michigan.
Yahoo Sports
Caitlin Clark catches fire from 3 in WNBA preseason; Arike Ogunbowale's late heroics send Wings past Fever
Caitlin Clark’s WNBA preseason debut went much like her senior year at Iowa. She hit a bunch of 3s and did so in front of a sold-out crowd.
Yahoo Sports
NFL Power Rankings, draft edition: Did Patriots fix their offensive issues?
Which teams did the best in the NFL Draft?
Yahoo Sports
2024 NFL Draft grades: Denver Broncos earn one of our lowest grades mostly due to one pick
Yahoo Sports' Charles McDonald breaks down the Broncos' 2024 draft.
Yahoo Sports
No one was airing Angel Reese and Kamilla Cardoso's WNBA preseason debuts, so an X user livestreamed it
The quality was choppy, but it was better than what the WNBA had.
Yahoo Sports
Formula 1: Miami Grand Prix sends cease and desist letter to prevent Donald Trump fundraiser during race
Race organizers say they'll revoke a Trump fundraiser's suite license if he holds an event for the former president on Sunday at the race.
Yahoo Sports
Kentucky Derby: Mystik Dan wins in three-horse photo finish, outruns favorite Fierceness in stunning upset
The 150th Kentucky Derby produced yet another magnificent two-minute spectacle.
Yahoo Sports
New details emerge in alleged gambling ring behind Shohei Ohtani-Ippei Mizuhara scandal
It turns out the money was going from Ohtani's bank account to an illegal bookie to ... casinos.
Yahoo Sports
NFL Draft grades for all 32 teams | Zero Blitz
Jason Fitz and Frank Schwab join forces to recap the draft in the best way they know how: letter grades! Fitz and Frank discuss all 32 teams division by division as they give a snapshot of how fans should be feeling heading into the 2024 season. The duo have key debates on the Dallas Cowboys, New York Giants, New Orleans Saints, Los Angeles Rams, New England Patriots, Las Vegas Raiders and more.
Yahoo Sports
The best RBs for 2024 fantasy football according to our analysts
The Yahoo Fantasy football analysts reveal their first running back rankings for the 2024 NFL season.
Yahoo Finance
CVS stock plunges after earnings numbers one analyst 'did not even believe'
CVS warns it could cede Medicare Advantage market share as reimbursement rates pressure the company.
Yahoo Sports
Lakers fire head coach Darvin Ham after just 2 seasons, latest playoff series loss to Nuggets
Despite a trip to the Western Conference finals in his first season with the team, the Lakers are now ready to look for a replacement for Darvin Ham.
Yahoo Sports
Diana Taurasi’s trash-talking, in-your-face ways may be a bit of a shock to new WNBA fans
Caitlin Clark fans beware: You never know what the 20-year veteran might say … or do.
Yahoo Life Shopping
Does castor oil really help with hair growth? We asked the experts, and their answer may surprise you
It's inexpensive, but is it effective? Dermatologists' verdict is in — and it's unanimous.
Yahoo Sports
Wide receiver rankings for fantasy football 2024
The Yahoo Fantasy football analysts reveal their first wide receiver rankings for the 2024 NFL season.
Yahoo Sports
Tight end rankings for 2024 fantasy football 2024
The Yahoo Fantasy football analysts reveal their first tight end rankings for the 2024 NFL season.
Yahoo Sports
WWE Backlash France 2024 results, grades and analysis: Cody Rhodes defeats A.J. Styles in first title defense
The Lyon, France crowd was electric start-to-finish in WWE's first major event since WrestleMania 40
Autoblog
EVs are the most expensive vehicles to operate over 1,000 miles, according to iSeeCars
iSeeCars calculated the cost to operate vehicles with different fuel types, finding that EVs are significantly more expensive than others.
Yahoo Sports
Ex-Florida State QB and 1999 Fiesta Bowl starter Marcus Outzen dies at 46
The Seminoles lost 23-16 to Tennessee in the first-ever BCS title game.
Yahoo News
2nd Boeing whistleblower found dead. Here's a timeline of the company's mounting problems.
This is the second Boeing whistleblower to die in the last two months.

News

Life

Entertainment

Finance

Sports

New on Yahoo

170,000-plus books used to train AI; authors say they weren’t asked

Recommended Stories

Former NBA guard Darius Morris dies at 33

Caitlin Clark catches fire from 3 in WNBA preseason; Arike Ogunbowale's late heroics send Wings past Fever

NFL Power Rankings, draft edition: Did Patriots fix their offensive issues?

2024 NFL Draft grades: Denver Broncos earn one of our lowest grades mostly due to one pick

No one was airing Angel Reese and Kamilla Cardoso's WNBA preseason debuts, so an X user livestreamed it

Formula 1: Miami Grand Prix sends cease and desist letter to prevent Donald Trump fundraiser during race

Kentucky Derby: Mystik Dan wins in three-horse photo finish, outruns favorite Fierceness in stunning upset

New details emerge in alleged gambling ring behind Shohei Ohtani-Ippei Mizuhara scandal

NFL Draft grades for all 32 teams | Zero Blitz

The best RBs for 2024 fantasy football according to our analysts

CVS stock plunges after earnings numbers one analyst 'did not even believe'

Lakers fire head coach Darvin Ham after just 2 seasons, latest playoff series loss to Nuggets

Diana Taurasi’s trash-talking, in-your-face ways may be a bit of a shock to new WNBA fans

Does castor oil really help with hair growth? We asked the experts, and their answer may surprise you

Wide receiver rankings for fantasy football 2024

Tight end rankings for 2024 fantasy football 2024

WWE Backlash France 2024 results, grades and analysis: Cody Rhodes defeats A.J. Styles in first title defense

EVs are the most expensive vehicles to operate over 1,000 miles, according to iSeeCars

Ex-Florida State QB and 1999 Fiesta Bowl starter Marcus Outzen dies at 46

2nd Boeing whistleblower found dead. Here's a timeline of the company's mounting problems.