Skip to content

How to Build a Scalable Wayfair Price Tracking System with Python

Dynamic pricing is ubiquitous in ecommerce today. Retailers like Wayfair continuously adjust prices in response to supply, demand, seasonality, and competitor actions. While favorable for sellers, this creates a challenge for consumers trying to find the best deals. A good price tracker levels the playing field.

In this comprehensive guide, you‘ll learn how to build a custom price tracking system for Wayfair using Python. We‘ll cover:

  • Scraping Wayfair product listings at scale
  • Storing and analyzing pricing data
  • Visualizing price history and trends
  • Sending smart alerts for price drops
  • Scheduling regular crawls as a background service
  • Deploying and scaling the tracker

By the end, you‘ll have a hands-on understanding of how to monitor ecommerce sites and make data-driven shopping decisions.

The Need for Ecommerce Price Trackers

First, let‘s look at why price trackers are so essential:

  • Dynamic Pricing is Pervasive – Over 50% of online retailers now use dynamic pricing algorithms that fluctuate by the day or hour based on inventory and demand.
  • Lack of Transparency – Retailers don‘t openly communicate price changes, forcing consumers to manually check.
  • Difficulty Finding Deals – Temporary sales and promotions are easy to miss when shopping across multiple sites.
  • Better Decisions – Seeing price history helps consumers determine if they‘re getting a true deal or spot seller price gouging.

With hundreds of retailers like Wayfair, monitoring all these moving pieces manually is impossible. Scraper bots come to the rescue!

Web Scraping Wayfair Product Listings

The first step is retrieving Wayfair product listings to extract key data points. To scrape any website, our tools of choice are the Requests library to download pages and BeautifulSoup to parse the HTML:

import requests
from bs4 import BeautifulSoup

url = ‘https://www.wayfair.com/lighting/pdp/wade-logan-anthonyson-61-arched-floor-lamp-w007189762.html‘

response = requests.get(url)
soup = BeautifulSoup(response.text, ‘html.parser‘)

Requests makes downloading pages trivial. We can pass any URL and get back the full HTML content.

BeautifulSoup then allows searching through the HTML using CSS selectors like a query language. For example, to extract the product title:

title = soup.select_one(‘h1[data-testid="productTitle"]‘).text.strip()
print(title)
# Anthonyson 61" Arched Floor Lamp

We can also extract the pricing information:

price = soup.find(‘span‘, {‘data-testid‘: ‘primaryPrice‘}).text
# $99.99

And image URL:

img_url = soup.find(‘img‘, {‘data-testid‘: ‘image‘})[‘src‘]
# https://secure.img1-fg.wfcdn.com/im/07181850/resize-h310-w310%5Ecompr-r85/1213/121389057/Anthonyson+61%2527%2527+Arched+Floor+Lamp.jpg

With a bit of CSS selector expertise, we can extract virtually any data from the pages.

Next let‘s package this up into a handy scraping function:

import requests
from bs4 import BeautifulSoup

def scrape_product(url):

  response = requests.get(url)

  soup = BeautifulSoup(response.text, ‘html.parser‘)

  name = soup.find(‘h1‘, {‘data-testid‘: ‘productTitle‘}).text.strip()

  price = soup.find(‘span‘, {‘data-testid‘: ‘primaryPrice‘}).text

  img_url = soup.find(‘img‘, {‘data-testid‘: ‘image‘})[‘src‘]

  data = {
    ‘name‘: name,
    ‘price‘: price, 
    ‘img_url‘: img_url
  }

  return data

We can now easily extract structured data from any product URL:

data = scrape_product(‘https://www.wayfair.com/lighting/pdp/wade-logan-anthonyson-61-arched-floor-lamp-w007189762.html‘)

print(data[‘name‘]) # Anthonyson 61" Arched Floor Lamp 
print(data[‘price‘]) # $99.99
print(data[‘img_url‘]) # https://secure.img1-fg.wfcdn.com/im/07181850/resize-h310-w310%5Ecompr-r85/1213/121389057/Anthonyson+61%2527%2527+Arched+Floor+Lamp.jpg

With Robust scraping functions like this, we can systematically retrieve data from thousands of product listing pages.

Storing Price Data in Pandas DataFrames

Now that we can gather the product data, we need a way to store it for analysis. Pandas provides an excellent Python data analysis toolkit.

Pandas DataFrames allow storing tabular data in memory or loading it from CSVs, databases, and more.

Let‘s initialize one to hold our scraped Wayfair data:

import pandas as pd

df = pd.DataFrame(columns=[‘name‘, ‘url‘, ‘price‘, ‘date‘])

We can append rows as we scrape:

data = scrape_product(‘https://www.wayfair.com/lighting/pdp/wade-logan-anthonyson-61-arched-floor-lamp-w007189762.html‘)

df = df.append({
  ‘name‘: data[‘name‘],
  ‘url‘: ‘https://www.wayfair.com/lighting/pdp/wade-logan-anthonyson-61-arched-floor-lamp-w007189762.html‘,
  ‘price‘: data[‘price‘],
  ‘date‘: ‘2023-03-01‘
}, ignore_index=True)

Passing ignore_index=True resets the indexes, so each row has a unique ID.

We can also easily load data from a CSV file:

df = pd.read_csv(‘wayfair-data.csv‘)

This gives us a structured dataset that‘s optimized for analysis and visualization.

With Pandas, we can further analyze item distributions and trends:

print(df.groupby(‘category‘)[‘price‘].median())
# Prints median price for each product category

df[‘price‘] = df[‘price‘].str.replace(‘$‘, ‘‘).astype(float)
print(df[‘price‘].nlargest(3)) 
# Prints top 3 highest prices

As our scraper accumulates more history, the DataFrame becomes a valuable resource for price optimization.

Understanding pricing trends over time is critical. Python‘s Matplotlib library makes plotting easy.

For example, we can visualize the price history for a given product:

import matplotlib.pyplot as plt

product_df = df[df[‘name‘] == ‘Foosball Table‘]
plt.plot(product_df[‘date‘], product_df[‘price‘])

plt.xlabel(‘Date‘)
plt.ylabel(‘Price‘)
plt.title(‘Foosball Table Price History‘)

plt.show() 

Foosball Price History

Seeing the graph makes pricing patterns clear at a glance.

We can also plot multiple products for comparison:

for name, group in df.groupby(‘name‘):
  plt.plot(group[‘date‘], group[‘price‘], label=name)

plt.legend(loc=‘best‘)

plt.xlabel(‘Date‘)
plt.ylabel(‘Price‘)  
plt.title(‘Price History‘)

plt.show()

This charts all products on one plot, with the legend differentiating each line.

Combined Price History

There are endless possibilities for custom statistical charts to extract insights from the pricing data.

Sending Alerts for Price Changes

One of the most useful features of a price tracker is getting notified on key changes. Python‘s standard library makes sending alerts easy.

For example, we can use smtplib to send an email when there‘s a price drop:

import smtplib

# Extract latest price 
current_price = get_current_price(‘wayfair-tracker-db‘) 

# Lookup old price
old_price = get_old_price(‘wayfair-tracker-db‘)

change = old_price - current_price

if change > 0:
  # Price dropped! 

  smtp = smtplib.SMTP(‘smtp.gmail.com‘, 587)

  message = f"Alert! Product X has dropped from {old_price} to {current_price}"

  smtp.sendmail(‘[email protected]‘, ‘[email protected]‘, message)

We can customize the frequency and threshold based on the user‘s preferences.

For mobile notifications, the Slack Web API makes sending messages simple:

from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError 

slack_token = os.environ["SLACK_TOKEN"]

client = WebClient(token=slack_token)

try:
  response = client.chat_postMessage(
    channel=‘price-alerts‘, text=message)

except SlackApiError as e:
  print(f"Error posting message: {e}")

Alert integrations like this let users stay effortlessly on top of pricing data.

Scheduling Regular Price Checks

To keep our tracker updated, we need to run it on a schedule. Python‘s sched module is one option for basic scheduled jobs.

We can set it up like:

import sched
import time

def price_check():
  # Run scraper
  # Generate alerts
  # Update database

  print(‘Finished price check‘)

scheduler = sched.scheduler(time.time, time.sleep)  
scheduler.enter(3600, 1, price_check) # Run every hour

scheduler.run()

For more robust production pipelines, Apache Airflow is an excellent workflow scheduler for Python. It has integrations with databases, data warehouses, S3, and messaging queues to run entire ETL processes.

Setting up scheduled jobs ensures we have up-to-date pricing data piped into our dashboards and alert systems.

Storing Data in SQLite Databases

For our small prototype, a Pandas DataFrame works fine. But for larger production trackers, a proper database makes more sense.

SQLite is convenient for serverless local data storage. Let‘s set up a table to store our Wayfair scraper data:

import sqlite3

conn = sqlite3.connect(‘wayfair.db‘)  

c = conn.cursor()
c.execute(‘‘‘
  CREATE TABLE products 
  (id INTEGER PRIMARY KEY, name TEXT, url TEXT, price TEXT, date TEXT)
‘‘‘)

# Insert sample row
c.execute("INSERT INTO products VALUES (1, ‘Foosball Table‘, ‘https://www.wayfair.com/xxxx‘, ‘$79.99‘, ‘2023-03-01‘)")

conn.commit()
conn.close() 

We can wrap DB operations like inserting rows, querying, etc in separate functions for clean code:

def insert_price(name, url, price, date):
  # Insert new row into database

def get_price_history(product_name):
  # Query DB and return dict of dates:prices

For scalable scrapers across larger catalogs, a managed cloud database like AWS RDS is a better fit than SQLite. The query interface stays nearly identical when using SQLAlchemy.

Building a Price Tracker Web App

To make our price tracker more accessible, we can wrap it in a web application using Flask or Django.

The app could provide:

  • Dashboards visualizing price history
  • Forms for adding new products to track
  • User accounts and alerts
  • Configuration options
  • Public API access to the pricing data

Here is an example view to display price history for a product:

from flask import Flask
import matplotlib.pyplot as plt

app = Flask(__name__)

@app.route(‘/price-history/<product_id>‘)
def show_graph(product_id):

  # Lookup DB
  data = get_price_history(product_id) 

  # Plot image
  plt.plot(data[‘date‘], data[‘price‘])

  img_bytes = plt.savefig()

  # Embed image into HTML
  return f"<img src=‘data:image/png;base64,{img_bytes}‘>"

The scaffolded structure of Flask/Django makes building out web UIs simple. We can retain all our existing scraping and analysis code as helper modules.

Deploying the Price Tracker to Production

For real world use, we need to deploy our tracker so it runs 24/7:

  • PythonAnywhere – Simple for hosting small Python apps and scheduled tasks
  • Heroku – More robust PaaS, can deploy Django/Flask apps
  • AWS Elastic Beanstalk – Deploy full Python web stacks on EC2 instances
  • Azure App Service – Managed app hosting on Microsoft Azure

These services all provide managed load balancing, autoscaling, monitoring, and deployment pipelines.

Some best practices for reliable production trackers:

  • Use worker queues like Celery to distribute load
  • Store data in managed cloud databases
  • Set up monitoring with Sentry or Datadog
  • Implement retries and error handling
  • Enable frequent auto-scaling based on traffic

Investing in production readiness ensures our scraper runs reliably 24/7.

Enhancing the Tracker with Machine Learning

Our basic tracker relies on current and historical pricing data. To take it a step further, we can employ machine learning algorithms to unlock additional insights:

  • Predict future prices based on seasons, holidays, and trends
  • Classify products by categories and filter types automatically
  • Recommend budget-friendly purchases based on user purchase history
  • Detect pricing errors like items drastically under/over-priced
  • Predict optimal price points for seller profit maximization

Libraries like TensorFlow, scikit-learn, and PyTorch integrate seamlessly with Python for building machine learning capabilities.

Conclusion

In this guide, we built out a robust Wayfair price tracking pipeline in Python leveraging:

  • Web scraping – Harvest product data
  • Data analysis – Manipulate, analyze, and visualize
  • Alerting – Receive notifications on key changes
  • Task scheduling – Run as a background service
  • Cloud deployment – Scale and monitor in production

The same framework extends to any ecommerce site or web data source.

With the democratization of data tools in Python, building custom scrapers is accessible for both businesses and savvy shoppers. The tactics covered in this guide serve as a blueprint to unlock pricing insights. Let me know if you have any other questions!

Join the conversation

Your email address will not be published. Required fields are marked *