What is web scraping? Here's what you need to know about the process of collecting automated data from websites, and its uses

Dave Johnson

Updated January 14, 2021 at 9:53 AM·3 min read

software developer analyzing data on laptop tablet desktop — Web scraping, the process of extracting data en masse from websites, has a variety of practical uses. Cavan Images/Getty Images

Web scraping is the process of using automated software, like bots, to extract structured data from websites.
There are many applications for web scraping, including monitoring product retail prices, lead generation, and analyzing sentiment about products and companies on social media.
Here's a brief overview of web scraping, its applications, and how it works.
Visit Business Insider's Tech Reference library for more stories.

Web scraping is the name given to the process of extracting structured data from third-party websites. In other words, it's a way to capture specific information from one or more websites without also copying unwanted or unrelated information. It's a common practice that has a lot of potential applications and a murky legal profile.

What to know about web scraping

Web scraping is usually an automated process, but it doesn't have to be; data can be scraped from websites manually, by humans, though that's slow and inefficient. More commonly, scraping is performed by software designed specifically for this application, generally in two main components. A crawler is a program that browses the internet and indexes the content of interest, and it passes this information onto the scraper.

The scraper is designed to locate the relevant structured information using markers called data locators. These locators indicate the presence of the data, which the scraper then extracts and stores offline in a spreadsheet or database for processing or analysis.

One simple example of web scraping: Consider a website that aggregates pricing information for retail products so shoppers can see which retailers have the best prices. A scraper can be programmed to index the product pages at every major retailer, with the scraper then visiting each page and using data locators to zero in just on the price field and ignore all the other data on the page - product description, reviews, and so on. The scraper can be run daily to update the webpage with the latest pricing information from around the web.

How web scraping is used

Because there is an enormous variety of data online, there is a wide variety of applications for web scraping. Here are some of the most common uses:

Price intelligence: Like the example above, many web scrapers are designed to monitor prices from retail sites. Retailers might use this to monitor prices at competitor sites, or the data might be used for competitive analysis, monitoring trends, or as a service to other users.
Real estate: Similarly, web scrapers commonly target real estate sites to monitor rental and sale prices, appraise property values in a given region, and conduct market analysis.
Lead generation: Marketers commonly use web scraping to generate leads by scraping structured data from websites like LinkedIn.
Sentiment analysis: Brands even use web scraping to understand how their products and services are being talked about online. Companies can collect data that mentions their name from social media sites like Facebook and Twitter.

The legality of web scraping

There's no easy answer to the question of web scraping's legality. This technology has had a number of legal challenges dating back to 2000, when online auction site eBay filed an injunction (which was granted by the court) against a site called Bidder's Edge for scraping its auction data.

In the years since, there have been a number of additional challenges to web scraping, but in 2017 LinkedIn lost a suit against a business that was scraping its content. With some precedent in the courts both for and against web scraping, it's currently a common practice across the internet.

Related coverage from Tech Reference:

Read the original article on Business Insider

Yahoo Finance
The FDIC change that leaves wealthy bank depositors with less protection
Affluent Americans may want to double-check how much of their bank deposits are protected by government-backed insurance. The rules governing trust accounts just changed.
Yahoo Sports
Former NBA guard Darius Morris dies at 33
Former NBA guard Darius Morris has died at the age of 33. He played for five teams during his four NBA seasons. Morris played college basketball at Michigan.
Yahoo Sports
No one was airing Angel Reese and Kamilla Cardoso's WNBA preseason debuts, so an X user livestreamed it
The quality was choppy, but it was better than what the WNBA had.
Yahoo Sports
2024 NFL Draft grades: Denver Broncos earn one of our lowest grades mostly due to one pick
Yahoo Sports' Charles McDonald breaks down the Broncos' 2024 draft.
Yahoo Sports
NFL Power Rankings, draft edition: Did Patriots fix their offensive issues?
Which teams did the best in the NFL Draft?
Yahoo Sports
NFL Draft grades for all 32 teams | Zero Blitz
Jason Fitz and Frank Schwab join forces to recap the draft in the best way they know how: letter grades! Fitz and Frank discuss all 32 teams division by division as they give a snapshot of how fans should be feeling heading into the 2024 season. The duo have key debates on the Dallas Cowboys, New York Giants, New Orleans Saints, Los Angeles Rams, New England Patriots, Las Vegas Raiders and more.
Yahoo Sports
New details emerge in alleged gambling ring behind Shohei Ohtani-Ippei Mizuhara scandal
It turns out the money was going from Ohtani's bank account to an illegal bookie to ... casinos.
Yahoo Sports
Formula 1: Miami Grand Prix sends cease and desist letter to prevent Donald Trump fundraiser during race
Race organizers say they'll revoke a Trump fundraiser's suite license if he holds an event for the former president on Sunday at the race.
Yahoo Sports
Bulls' Lonzo Ball picks up $21.4 million option for 2024-25 season amid knee concerns
Ball hasn't played since the 2021-22 season.
Yahoo Sports
Kentucky Derby: Mystik Dan wins in three-horse photo finish, outruns favorite Fierceness in stunning upset
The 150th Kentucky Derby produced yet another magnificent two-minute spectacle.
Yahoo Sports
Anthony Edwards' arrival should have the Nuggets — and the entire league — on high alert
The Timberwolves' rising superstar led Minnesota to a Game 1 victory over Denver, then reminded the world that he's 22, not 23 ... yet.
Yahoo Finance
CVS stock plunges after earnings numbers one analyst 'did not even believe'
CVS warns it could cede Medicare Advantage market share as reimbursement rates pressure the company.
Yahoo Sports
The best QBs for 2024 fantasy football according to our analysts
The Yahoo Fantasy football analysts reveal their first quarterback rankings for the 2024 NFL season.
Yahoo Sports
Canelo Álvarez and Oscar De La Hoya erupt in heated exchange ahead of title bout with Jaime Munguía
Canelo Álvarez is set to defend his title against undefeated Jaime Munguía on Saturday in Las Vegas.
Yahoo Sports
Diana Taurasi’s trash-talking, in-your-face ways may be a bit of a shock to new WNBA fans
Caitlin Clark fans beware: You never know what the 20-year veteran might say … or do.
Autoblog
Why did Musk ax the Supercharger team?
Elon Musk’s decision to dismiss much of Tesla’s Supercharger team this week came as a shock. A look at possible reasons why he did it.
Yahoo Sports
Caitlin Clark catches fire from 3 in WNBA preseason debut; Arike Ogunbowale's late heroics send Wings past Fever
Caitlin Clark’s WNBA preseason debut went much like her senior year at Iowa. She hit a bunch of 3s and did so in front of a sold-out crowd.
Yahoo Sports
Ex-Florida State QB and 1999 Fiesta Bowl starter Marcus Outzen dies at 46
The Seminoles lost 23-16 to Tennessee in the first-ever BCS title game.
Yahoo News
2nd Boeing whistleblower found dead. Here's a timeline of the company's mounting problems.
This is the second Boeing whistleblower to die in the last two months.
Yahoo Sports
UFC 301: Alexandre Pantoja survives bloody wounds and bad advice to retain flyweight belt vs. Steve Erceg
A challenger with less than a year of UFC experience gave Pantoja plenty of problems. And wounds.

News

Life

Entertainment

Finance

Sports

New on Yahoo

What is web scraping? Here's what you need to know about the process of collecting automated data from websites, and its uses

What to know about web scraping

How web scraping is used

The legality of web scraping

Related coverage from Tech Reference:

The beginner's guide to streaming, including how it works, the pros and cons, and more

What is LiDAR? How everyday devices use lasers to scan your environment

What is augmented reality? Here's what you need to know about the 3D technology

What is machine learning? Here's what you need to know about the branch of artificial intelligence and its common applications

What is net neutrality? Here's what you need to know about the open internet concept

Recommended Stories

The FDIC change that leaves wealthy bank depositors with less protection

Former NBA guard Darius Morris dies at 33

No one was airing Angel Reese and Kamilla Cardoso's WNBA preseason debuts, so an X user livestreamed it

2024 NFL Draft grades: Denver Broncos earn one of our lowest grades mostly due to one pick

NFL Power Rankings, draft edition: Did Patriots fix their offensive issues?

NFL Draft grades for all 32 teams | Zero Blitz

New details emerge in alleged gambling ring behind Shohei Ohtani-Ippei Mizuhara scandal

Formula 1: Miami Grand Prix sends cease and desist letter to prevent Donald Trump fundraiser during race

Bulls' Lonzo Ball picks up $21.4 million option for 2024-25 season amid knee concerns

Kentucky Derby: Mystik Dan wins in three-horse photo finish, outruns favorite Fierceness in stunning upset

Anthony Edwards' arrival should have the Nuggets — and the entire league — on high alert

CVS stock plunges after earnings numbers one analyst 'did not even believe'

The best QBs for 2024 fantasy football according to our analysts

Canelo Álvarez and Oscar De La Hoya erupt in heated exchange ahead of title bout with Jaime Munguía

Diana Taurasi’s trash-talking, in-your-face ways may be a bit of a shock to new WNBA fans

Why did Musk ax the Supercharger team?

Caitlin Clark catches fire from 3 in WNBA preseason debut; Arike Ogunbowale's late heroics send Wings past Fever

Ex-Florida State QB and 1999 Fiesta Bowl starter Marcus Outzen dies at 46

2nd Boeing whistleblower found dead. Here's a timeline of the company's mounting problems.

UFC 301: Alexandre Pantoja survives bloody wounds and bad advice to retain flyweight belt vs. Steve Erceg