How to Scrape Ebay using Python

Ebay is one of the largest e-commerce marketplaces on the internet, with millions of active listings across thousands of categories. As an open marketplace, Ebay contains a wealth of public data that can be extracted and analyzed using web scraping.

In this comprehensive guide, we‘ll walk through how to build a Python web scraper to extract key data fields from Ebay listings and search results.

Why Scrape Ebay Data?

Here are some of the main reasons you may want to scrape data from Ebay:

Market Research – Analyze product listings, prices, seller info to gain insights into market trends and opportunities.
Price Monitoring – Track prices over time for pricing analytics or to snipe deals.
Dropshipping – Source product ideas and inventory from Ebay sellers.
Lead Generation – Discover and extract contact information for high-volume Ebay sellers.
Catalog Enrichment – Match your existing product catalog against Ebay listings.
Machine Learning – Collect structured data to train ML models for tasks like duplicate product detection.
Personalized Alerts – Get notified when relevant new listings matching your interests are posted.

As one of the largest open product catalogs on the web, Ebay is a goldmine for scraping-driven e-commerce analytics.

Available Data Fields to Scrape

Ebay pages contain a wealth of data that we can extract through web scraping. For this guide, we‘ll focus on scraping the following key fields:

Product URL
Product ID
Title
Description
Variants (for multi-variant listings)
Price(s)
Converted Prices (automatic currency conversion)
Image URLs
Seller Name
Seller URL
Item Conditions
Item Features
Rating
Review Count

And more. The techniques covered can be adapted to extract additional data fields like shipping costs, return policies, item specifics, and so on.

Now let‘s look at how to extract these fields from Ebay pages.

Setup

We‘ll use Python for web scraping Ebay. The key packages we need are:

Requests – for retrieving page content
Beautiful Soup – for parsing and extracting data from HTML and XML

Install them via pip:

pip install requests beautifulsoup4

Alternatively, you can use Selenium with a browser automation framework like Scrapy instead of Requests/BeautifulSoup.

Scraping Single-Variant Listings

First, we‘ll look at scraping listings that only have a single product for sale (no variant options).

For example: https://www.ebay.com/itm/275263444016

Viewing the page source, we can see HTML elements containing the data we want to extract:

Let‘s write a Python scraper to extract these elements:

import requests
from bs4 import BeautifulSoup

URL = "https://www.ebay.com/itm/275263444016"

def scrape_listing(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")

    title = soup.select_one("#itemTitle").text.strip()
    price = soup.select_one("#prcIsum").text
    seller = soup.select_one("a[class*=‘seller-info‘]").text

    # And so on for other fields

    return {
        "title": title,
        "price": price,
        "seller": seller
        #...
    }

data = scrape_listing(URL)
print(data)

This locates elements by IDs and CSS classes, extracts the inner text or attributes, and returns a Python dictionary containing the scraped data.

The same principle applies for any other data fields you want to extract – inspect the page source to find patterns and locate the elements to extract.

Scraping Multi-Variant Listings

Some Ebay listings contain multiple variant products, like different sizes/colors of clothing or models of phones.

For example: https://www.ebay.com/itm/284807601540

The product data like price, quantity, etc. can vary for each variant.

On Ebay‘s site, this data is loaded dynamically via JavaScript. To extract it, we‘ll need to:

Find the JavaScript variable that stores the variant data array.
Parse the JSON data into a Python data structure.

Here is an example using the re and json modules:

import re
import json
import requests
from bs4 import BeautifulSoup

URL = "https://www.ebay.com/itm/284807601540"

def scrape_variants(url):
  response = requests.get(url)
  soup = BeautifulSoup(response.text, ‘html.parser‘)

  # Search for array variable that contains variant data
  pattern = re.compile(r‘var modelData = (.*);‘)
  script = soup.find(‘script‘, text=pattern)

  # Extract JSON and parse into Python dictionary
  data = json.loads(pattern.search(script.text).group(1))  

  variants = {}

  for v in data:
    variant_id = v[‘productId‘]

    variants[variant_id] = {
      "price": v[‘price‘],
      "available": v[‘quantityAvailable‘],
      # And so on, extract other needed variant fields
    }

  return variants

variants = scrape_variants(URL) 
print(variants)

This allows us to extract all pricing and inventory data for every product variant on an Ebay listing page.

The same principle can be applied to scraping other dynamically loaded content from Ebay pages.

Scraping Ebay Search Results

In addition to scraping individual listings, we can also build scrapers that extract data from Ebay‘s search results pages.

For example: https://www.ebay.com/sch/i.html?_nkw=laptop

These pages contain preview cards for each search result:

To extract data from these result cards, we can use a loop:

import requests
from bs4 import BeautifulSoup

URL = "https://www.ebay.com/sch/i.html?_nkw=laptop"  

def scrape_search_results(url):
  response = requests.get(url)  
  soup = BeautifulSoup(response.text, ‘html.parser‘)

  results = []

  for card in soup.select(".s-item__wrapper"):

    title = card.select_one(".s-item__title").text
    price = card.select_one(".s-item__price").text
    image = card.select_one("img").get("src")

    url = card.select_one(".s-item__link").get("href")

    results.append({
        "title": title,
        "price": price,
        "url": f"https://www.ebay.com{url}",
        "image": image
    })

  return results

data = scrape_search_results(URL)

print(data)

This locates each result card, extracts the fields we want, and appends the scraped data to a Python list.

Some key points:

We locate result cards using the .s-item__wrapper class.
Navigate down from the card container to extract inner elements like title, price, etc.
Construct full product URLs combining scraped relative URL with Ebay‘s base URL.

The same approach can be used to build scrapers for Ebay category pages, daily deals, and any other search/listing index pages.

Scraping Strategies to Avoid Blocking

When building scalable scrapers to extract large volumes of data from Ebay, we need to watch out getting blocked. Here are some tips:

Use Random Delays

Add random delays between requests to mimic human browsing behavior, for example:

import time
import random 

# Random delay between 2-6 seconds
time.sleep(random.uniform(2.0, 6.0))

Rotate User Agents

Spoof different desktop/mobile browsers by rotating user agent strings:

from fake_useragent import UserAgent

ua = UserAgent()

headers = {‘User-Agent‘: ua.random}

Use Proxies

Route requests through residential proxy IP addresses to mask scrapers and avoid IP blocks.

Handle CAPTCHAs

Detect and handle CAPTCHA challenges either manually or using a CAPTCHA solving service.

Use Scraping Services

Leverage scraping APIs like ScrapingBee, ScraperAPI or SerpApi to bypass blocks.

Scraping Ebay End-to-End Example

Let‘s tie the concepts together into one end-to-end web scraper for Ebay data.

It will:

Take a search query as input
Scrape search results page
Extract key data fields
Loop through to scrape each listing
Output structured JSON data

Here is the code:

import json
import random
import time 

import requests
from bs4 import BeautifulSoup
from scrape import Scraper # 3rd party scraping API client

scraper = Scraper() # Initialize scraping API client

def scrape_listing(url):
  """Scrape key data fields from listing page"""

  page = scraper.get(url)
  soup = BeautifulSoup(page.content, ‘html.parser‘)

  title = soup.select_one("#itemTitle").text.strip()
  price = soup.select_one("#prcIsum").text

  # And so on...

  return {
    "title": title,
    "price": price,
    #...
  }

def scrape_search(query, pages=1):

  print(f"Scraping Ebay for: {query}") 

  base_url = "https://www.ebay.com/sch/i.html?_nkw={query}"

  results = []

  for page in range(1, pages+1):

    url = base_url + f"&_pgn={page}"

    # Fetch page using scraping API to avoid blocks
    page = scraper.get(url) 
    soup = BeautifulSoup(page.content, ‘html.parser‘)

    for card in soup.select(".s-item"):

      url = card.select_one(".s-item__link").get("href")
      url = f"https://www.ebay.com{url}"

      # Scrape each listing page
      data = scrape_listing(url)  

      results.append(data)

    # Random delay
    time.sleep(random.uniform(3.0, 6.0))

  return results

data = scrape_search("iphone 12", pages=2)  

print(json.dumps(data, indent=2))

This provides a template for building a robust web scraper for Ebay data at scale.

The full code is available on GitHub.

Summary

Some key points covered in this tutorial:

We can extract many useful data fields from Ebay pages including pricing, inventory, seller info, ratings and more.
For single-variant listings, extract data using CSS selectors and Beautiful Soup.
To scrape variant data, parse the JavaScript object containing the information.
Build scrapers to extract search results and dynamically paginate through the pages.
Employ strategies like proxies and random delays to avoid getting blocked.
Chain together listing detail scrapers with search scrapers for end-to-end scraping.

The techniques covered provide a blueprint for building robust Ebay web scrapers in Python. The data extracted can power a wide range of e-commerce analytics use cases.

You can find additional examples and patterns in the full repository on GitHub.

Why Scrape Ebay Data?

Available Data Fields to Scrape

Setup

Scraping Single-Variant Listings

Scraping Multi-Variant Listings

Scraping Ebay Search Results

Scraping Strategies to Avoid Blocking

Scraping Ebay End-to-End Example

Summary

Join the conversation Cancel reply

Related Posts

What‘s the Difference Between Web Scraping and Crawling?

What are some BeautifulSoup alternatives for HTML parsing in Python?

How to Web Scrape with HTTPX and Python