Dynamic pricing is ubiquitous in ecommerce today. Retailers like Wayfair continuously adjust prices in response to supply, demand, seasonality, and competitor actions. While favorable for sellers, this creates a challenge for consumers trying to find the best deals. A good price tracker levels the playing field.
In this comprehensive guide, you‘ll learn how to build a custom price tracking system for Wayfair using Python. We‘ll cover:
- Scraping Wayfair product listings at scale
- Storing and analyzing pricing data
- Visualizing price history and trends
- Sending smart alerts for price drops
- Scheduling regular crawls as a background service
- Deploying and scaling the tracker
By the end, you‘ll have a hands-on understanding of how to monitor ecommerce sites and make data-driven shopping decisions.
The Need for Ecommerce Price Trackers
First, let‘s look at why price trackers are so essential:
- Dynamic Pricing is Pervasive – Over 50% of online retailers now use dynamic pricing algorithms that fluctuate by the day or hour based on inventory and demand.
- Lack of Transparency – Retailers don‘t openly communicate price changes, forcing consumers to manually check.
- Difficulty Finding Deals – Temporary sales and promotions are easy to miss when shopping across multiple sites.
- Better Decisions – Seeing price history helps consumers determine if they‘re getting a true deal or spot seller price gouging.
With hundreds of retailers like Wayfair, monitoring all these moving pieces manually is impossible. Scraper bots come to the rescue!
Web Scraping Wayfair Product Listings
The first step is retrieving Wayfair product listings to extract key data points. To scrape any website, our tools of choice are the Requests library to download pages and BeautifulSoup to parse the HTML:
import requests
from bs4 import BeautifulSoup
url = ‘https://www.wayfair.com/lighting/pdp/wade-logan-anthonyson-61-arched-floor-lamp-w007189762.html‘
response = requests.get(url)
soup = BeautifulSoup(response.text, ‘html.parser‘)
Requests makes downloading pages trivial. We can pass any URL and get back the full HTML content.
BeautifulSoup then allows searching through the HTML using CSS selectors like a query language. For example, to extract the product title:
title = soup.select_one(‘h1[data-testid="productTitle"]‘).text.strip()
print(title)
# Anthonyson 61" Arched Floor Lamp
We can also extract the pricing information:
price = soup.find(‘span‘, {‘data-testid‘: ‘primaryPrice‘}).text
# $99.99
And image URL:
img_url = soup.find(‘img‘, {‘data-testid‘: ‘image‘})[‘src‘]
# https://secure.img1-fg.wfcdn.com/im/07181850/resize-h310-w310%5Ecompr-r85/1213/121389057/Anthonyson+61%2527%2527+Arched+Floor+Lamp.jpg
With a bit of CSS selector expertise, we can extract virtually any data from the pages.
Next let‘s package this up into a handy scraping function:
import requests
from bs4 import BeautifulSoup
def scrape_product(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, ‘html.parser‘)
name = soup.find(‘h1‘, {‘data-testid‘: ‘productTitle‘}).text.strip()
price = soup.find(‘span‘, {‘data-testid‘: ‘primaryPrice‘}).text
img_url = soup.find(‘img‘, {‘data-testid‘: ‘image‘})[‘src‘]
data = {
‘name‘: name,
‘price‘: price,
‘img_url‘: img_url
}
return data
We can now easily extract structured data from any product URL:
data = scrape_product(‘https://www.wayfair.com/lighting/pdp/wade-logan-anthonyson-61-arched-floor-lamp-w007189762.html‘)
print(data[‘name‘]) # Anthonyson 61" Arched Floor Lamp
print(data[‘price‘]) # $99.99
print(data[‘img_url‘]) # https://secure.img1-fg.wfcdn.com/im/07181850/resize-h310-w310%5Ecompr-r85/1213/121389057/Anthonyson+61%2527%2527+Arched+Floor+Lamp.jpg
With Robust scraping functions like this, we can systematically retrieve data from thousands of product listing pages.
Storing Price Data in Pandas DataFrames
Now that we can gather the product data, we need a way to store it for analysis. Pandas provides an excellent Python data analysis toolkit.
Pandas DataFrames allow storing tabular data in memory or loading it from CSVs, databases, and more.
Let‘s initialize one to hold our scraped Wayfair data:
import pandas as pd
df = pd.DataFrame(columns=[‘name‘, ‘url‘, ‘price‘, ‘date‘])
We can append rows as we scrape:
data = scrape_product(‘https://www.wayfair.com/lighting/pdp/wade-logan-anthonyson-61-arched-floor-lamp-w007189762.html‘)
df = df.append({
‘name‘: data[‘name‘],
‘url‘: ‘https://www.wayfair.com/lighting/pdp/wade-logan-anthonyson-61-arched-floor-lamp-w007189762.html‘,
‘price‘: data[‘price‘],
‘date‘: ‘2023-03-01‘
}, ignore_index=True)
Passing ignore_index=True
resets the indexes, so each row has a unique ID.
We can also easily load data from a CSV file:
df = pd.read_csv(‘wayfair-data.csv‘)
This gives us a structured dataset that‘s optimized for analysis and visualization.
With Pandas, we can further analyze item distributions and trends:
print(df.groupby(‘category‘)[‘price‘].median())
# Prints median price for each product category
df[‘price‘] = df[‘price‘].str.replace(‘$‘, ‘‘).astype(float)
print(df[‘price‘].nlargest(3))
# Prints top 3 highest prices
As our scraper accumulates more history, the DataFrame becomes a valuable resource for price optimization.
Visualizing Price Trends with Matplotlib
Understanding pricing trends over time is critical. Python‘s Matplotlib library makes plotting easy.
For example, we can visualize the price history for a given product:
import matplotlib.pyplot as plt
product_df = df[df[‘name‘] == ‘Foosball Table‘]
plt.plot(product_df[‘date‘], product_df[‘price‘])
plt.xlabel(‘Date‘)
plt.ylabel(‘Price‘)
plt.title(‘Foosball Table Price History‘)
plt.show()
Seeing the graph makes pricing patterns clear at a glance.
We can also plot multiple products for comparison:
for name, group in df.groupby(‘name‘):
plt.plot(group[‘date‘], group[‘price‘], label=name)
plt.legend(loc=‘best‘)
plt.xlabel(‘Date‘)
plt.ylabel(‘Price‘)
plt.title(‘Price History‘)
plt.show()
This charts all products on one plot, with the legend differentiating each line.
There are endless possibilities for custom statistical charts to extract insights from the pricing data.
Sending Alerts for Price Changes
One of the most useful features of a price tracker is getting notified on key changes. Python‘s standard library makes sending alerts easy.
For example, we can use smtplib to send an email when there‘s a price drop:
import smtplib
# Extract latest price
current_price = get_current_price(‘wayfair-tracker-db‘)
# Lookup old price
old_price = get_old_price(‘wayfair-tracker-db‘)
change = old_price - current_price
if change > 0:
# Price dropped!
smtp = smtplib.SMTP(‘smtp.gmail.com‘, 587)
message = f"Alert! Product X has dropped from {old_price} to {current_price}"
smtp.sendmail(‘[email protected]‘, ‘[email protected]‘, message)
We can customize the frequency and threshold based on the user‘s preferences.
For mobile notifications, the Slack Web API makes sending messages simple:
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError
slack_token = os.environ["SLACK_TOKEN"]
client = WebClient(token=slack_token)
try:
response = client.chat_postMessage(
channel=‘price-alerts‘, text=message)
except SlackApiError as e:
print(f"Error posting message: {e}")
Alert integrations like this let users stay effortlessly on top of pricing data.
Scheduling Regular Price Checks
To keep our tracker updated, we need to run it on a schedule. Python‘s sched module is one option for basic scheduled jobs.
We can set it up like:
import sched
import time
def price_check():
# Run scraper
# Generate alerts
# Update database
print(‘Finished price check‘)
scheduler = sched.scheduler(time.time, time.sleep)
scheduler.enter(3600, 1, price_check) # Run every hour
scheduler.run()
For more robust production pipelines, Apache Airflow is an excellent workflow scheduler for Python. It has integrations with databases, data warehouses, S3, and messaging queues to run entire ETL processes.
Setting up scheduled jobs ensures we have up-to-date pricing data piped into our dashboards and alert systems.
Storing Data in SQLite Databases
For our small prototype, a Pandas DataFrame works fine. But for larger production trackers, a proper database makes more sense.
SQLite is convenient for serverless local data storage. Let‘s set up a table to store our Wayfair scraper data:
import sqlite3
conn = sqlite3.connect(‘wayfair.db‘)
c = conn.cursor()
c.execute(‘‘‘
CREATE TABLE products
(id INTEGER PRIMARY KEY, name TEXT, url TEXT, price TEXT, date TEXT)
‘‘‘)
# Insert sample row
c.execute("INSERT INTO products VALUES (1, ‘Foosball Table‘, ‘https://www.wayfair.com/xxxx‘, ‘$79.99‘, ‘2023-03-01‘)")
conn.commit()
conn.close()
We can wrap DB operations like inserting rows, querying, etc in separate functions for clean code:
def insert_price(name, url, price, date):
# Insert new row into database
def get_price_history(product_name):
# Query DB and return dict of dates:prices
For scalable scrapers across larger catalogs, a managed cloud database like AWS RDS is a better fit than SQLite. The query interface stays nearly identical when using SQLAlchemy.
Building a Price Tracker Web App
To make our price tracker more accessible, we can wrap it in a web application using Flask or Django.
The app could provide:
- Dashboards visualizing price history
- Forms for adding new products to track
- User accounts and alerts
- Configuration options
- Public API access to the pricing data
Here is an example view to display price history for a product:
from flask import Flask
import matplotlib.pyplot as plt
app = Flask(__name__)
@app.route(‘/price-history/<product_id>‘)
def show_graph(product_id):
# Lookup DB
data = get_price_history(product_id)
# Plot image
plt.plot(data[‘date‘], data[‘price‘])
img_bytes = plt.savefig()
# Embed image into HTML
return f"<img src=‘data:image/png;base64,{img_bytes}‘>"
The scaffolded structure of Flask/Django makes building out web UIs simple. We can retain all our existing scraping and analysis code as helper modules.
Deploying the Price Tracker to Production
For real world use, we need to deploy our tracker so it runs 24/7:
- PythonAnywhere – Simple for hosting small Python apps and scheduled tasks
- Heroku – More robust PaaS, can deploy Django/Flask apps
- AWS Elastic Beanstalk – Deploy full Python web stacks on EC2 instances
- Azure App Service – Managed app hosting on Microsoft Azure
These services all provide managed load balancing, autoscaling, monitoring, and deployment pipelines.
Some best practices for reliable production trackers:
- Use worker queues like Celery to distribute load
- Store data in managed cloud databases
- Set up monitoring with Sentry or Datadog
- Implement retries and error handling
- Enable frequent auto-scaling based on traffic
Investing in production readiness ensures our scraper runs reliably 24/7.
Enhancing the Tracker with Machine Learning
Our basic tracker relies on current and historical pricing data. To take it a step further, we can employ machine learning algorithms to unlock additional insights:
- Predict future prices based on seasons, holidays, and trends
- Classify products by categories and filter types automatically
- Recommend budget-friendly purchases based on user purchase history
- Detect pricing errors like items drastically under/over-priced
- Predict optimal price points for seller profit maximization
Libraries like TensorFlow, scikit-learn, and PyTorch integrate seamlessly with Python for building machine learning capabilities.
Conclusion
In this guide, we built out a robust Wayfair price tracking pipeline in Python leveraging:
- Web scraping – Harvest product data
- Data analysis – Manipulate, analyze, and visualize
- Alerting – Receive notifications on key changes
- Task scheduling – Run as a background service
- Cloud deployment – Scale and monitor in production
The same framework extends to any ecommerce site or web data source.
With the democratization of data tools in Python, building custom scrapers is accessible for both businesses and savvy shoppers. The tactics covered in this guide serve as a blueprint to unlock pricing insights. Let me know if you have any other questions!