What exactly is a dynamic page?

Hey there! Dynamic web pages can seem confusing at first glance. My name‘s [John] and I‘ve been working with web scrapers for over 5 years, so I‘m here to break down exactly what dynamic pages are all about.

Whether you‘re new to web scraping or a seasoned pro, understanding what makes a page "dynamic" is key for being able to scrape it effectively. By the end of this guide, you‘ll know how to identify dynamic pages, why they matter for web scraping, and how to approach scraping them.

Static vs. Dynamic Pages

To understand what a dynamic page is, it helps to first look at the opposite – a static page.

A static web page has content that is pre-defined and does not change after the initial page load. All HTML, CSS, JavaScript, images, and other assets are loaded from the server when the page first renders.

According to Google‘s Web Fundamentals documentation, static pages have two key qualities:

They display the same content for all users.
The content doesn‘t change based on user interaction.

Some examples of static pages:

Simple HTML sites without much interactivity
Old school informational sites like Wikipedia
Classic blogs and websites

A dynamic web page, on the other hand, updates its content after the initial load, without having to completely refresh the page. This is achieved by making additional requests (typically XHR or Fetch requests using JavaScript) to retrieve more data from the backend on-demand.

That new data can then be used to modify the DOM and update what you see on the page.

According to the MDN Web Docs, some signs of a dynamic page are:

The page content changes without reloading the page
DOM elements are added, removed, or manipulated
Additional data is downloaded and interacted with

Common examples of dynamic pages:

Modern interactive web apps like Twitter, Facebook, etc.
Sites with infinite scrolling feeds like Instagram, Reddit, Pinterest
Ecommerce product listings that lazy load items
Pages built with JavaScript frameworks like React, Vue, Angular

So in summary, a dynamic page can update its content and DOM after the initial load by making additional requests to retrieve data from the backend. Pretty much any modern web app will have some level of dynamic loading.

Identifying Dynamic Pages

Now that you understand the difference between static and dynamic pages conceptually, let‘s go over some clear signs that indicate a page is dynamic:

1. Built with a JavaScript framework

If a site is built using React, Vue, Angular, Svelte, or any other frontend JavaScript framework – it‘s most likely a dynamic page.

This 2021 survey from State of JS showed the popularity of frontend frameworks:

Framework	Usage
React	88.4%
Vue.js	55.1%
Angular	50.3%

You can identify if a site uses React by installing the React Developer Tools browser extension, which will highlight React components on the page.

2. Lazy loading content

If a page only loads assets like images, videos, or elements like posts and comments when they are scrolled into view – that‘s a form of dynamic loading.

The content is only retrieved from the backend when needed, as the user is viewing that part of the page. Lazy loading improves performance.

3. New network requests on interaction

Open up the Network tab in Chrome Developer Tools. Reload the page and watch the requests – if additional XHR or Fetch requests are made when you scroll, click buttons, hover over elements, etc. then dynamic content is likely being loaded.

For example, here are the new requests made when scrolling on Twitter:

Twitter network requests demo

Each additional request returns more tweets to be rendered on the page.

4. Updating DOM elements

Inspect elements on the page before and after interacting with it. If the DOM changes at all – new elements are added, element attributes change, nodes are removed – that indicates dynamic modifications.

For example, here is the Twitter DOM updating as I scroll through my feed:

Twitter DOM changes demo

New tweets are injected into the DOM dynamically as they are fetched from the backend.

5. Disabling JavaScript

An easy way to test if a page relies on JavaScript for dynamic loading is to temporarily disable JS in your browser (using an extension like Quick JavaScript Switcher) and reload.

If the page content changes significantly with JavaScript off, that‘s a sign it uses JS to load content dynamically.

Examples of Dynamic Pages

Let‘s look at some real world examples to make identifying dynamic pages more concrete.

Twitter

We already used Twitter as an example earlier in the article. It‘s a quintessential dynamic web app, with infinite scrolling, lazy loading content, and DOM updates.

Some clear signs Twitter is dynamic:

Built with React
Lazy loads images/videos
New network requests when scrolling
DOM updates with new tweets

Here is a recap video highlighting the dynamic behavior:

Twitter Dynamic Page Loading Demo

Facebook

Facebook is another prime example of a dynamic web app:

Infinite scrolling feeds
Lazy loading images/videos
Additional network requests when scrolling
DOM updates with new posts

Here is a video showing the dynamic loading on Facebook:

Facebook Dynamic Loading Demo

Reddit is an extremely dynamic site, being one of the pioneers of infinite scrolling feeds. Some signs Reddit is dynamic:

Built with React
Loads new posts constantly as you scroll
Images lazy load
Tons of new network requests when scrolling

You can see the dynamic loading yourself by opening Reddit in your browser.

Amazon

Even Amazon‘s product listings have dynamic loading behavior:

Additional products load when scrolling
Network requests fetch the new products
DOM updates with the new HTML

Here is an example – note how the scrollbar jumps back up when new products are loaded:

Amazon Dynamic Product Loading

This improves performance by only loading products as they are viewed.

Medium

Blogging platforms like Medium have dynamic loading so articles load faster:

Images lazy load as you scroll
The body text loads in chunks
Additional network requests retrieve more text

You can observe this behavior on any article on Medium.

Pinterest is of course full of infinite scrolling dynamic loading:

Pins lazy load as you scroll
New network requests fetch more pins
The DOM updates with the new pins

Very similar behavior to other social media feeds.

Why It Matters for Web Scraping

Now that you know how to recognize dynamic web pages, let‘s discuss why it matters when scraping.

Understanding if a page is static vs dynamic affects the tools and techniques required to build an effective scraper.

Scraping Static Pages

For static web pages, we can simply:

Make a single HTTP request to fetch the full HTML
Parse the HTML content in our programming language of choice
Extract any data we need with a parsing library like Beautiful Soup in Python or Cheerio in Node.js

For example, here is a simple Python scraper using Requests and Beautiful Soup to scrape a static Wikipedia page:

import requests
from bs4 import BeautifulSoup

url = ‘https://en.wikipedia.org/wiki/Hippopotamus‘ 

response = requests.get(url)
soup = BeautifulSoup(response.text, ‘html.parser‘)

title = soup.find(‘h1‘).text
infobox = soup.find(‘table‘, class_=‘infobox‘)

print(title)
print(infobox)

Because Wikipedia serves all content including text, images, tables, etc. in the initial HTML, we can parse the full page content in a simple script.

No need for a headless browser or simulation of user interactions. The static HTML contains all the data we need.

This simple scraping approach works well for many static sites and pages.

Scraping Dynamic Pages

Scraping dynamic pages requires a different approach, because:

The initial HTML does not contain all the content – more is loaded dynamically
We need to simulate user interactions like scrolling to trigger content loading
JavaScript execution is required to handle the dynamic requests

This means we must use a headless browser like Puppeteer, Playwright, or Selenium to:

Load the full JavaScript + CSS to render the initial page correctly
Scroll, click, hover, etc. to trigger dynamic content loading
Wait for network requests to complete and page to update
Extract updated HTML containing dynamically loaded content

For example, here is how to scrape infinite scrolling Twitter feeds with Python + Playwright:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()  
    page.goto(‘https://twitter.com/home‘)

    page.scroll_to_bottom() # trigger dynamic loading
    page.wait_for_load_state(‘networkidle‘) 

    html = page.content() # get full html
    # parse html to extract tweets... 

   browser.close()

The key differences from static scraping:

Launching a full Chrome browser
Executing scroll commands to trigger content loading
Waiting for the network to idle and DOM to update
Getting updated HTML after dynamic loading finishes

This browser automation is required to fully render dynamic pages before scraping.

Static vs. Dynamic Scraping

Here is a comparison of static versus dynamic page scraping:

	Static Page Scraping	Dynamic Page Scraping
Tools	Requests, Beautiful Soup, Cheerio	Puppeteer, Playwright, Selenium
Methods	Single HTTP request, parse HTML	Browser automation, interaction simulation, wait for network idle
Performance	Very fast, minimal overhead	Slower due to browser load and action delays
JavaScript Needed?	No	Yes, required to execute dynamic actions

Understanding these key differences will help you choose the right approach for each scraping project.

Tools for Scraping Dynamic Pages

Now let‘s discuss some of the most popular tools for scraping dynamic JavaScript-heavy sites:

Puppeteer

Puppeteer is a Node.js library developed by Google for controlling headless Chrome. It allows us to:

Launch a browser instance
Load pages
Interact by scrolling, clicking, filling forms
Execute JavaScript on the pages
Access the DOM and extract HTML

With over 67k stars on GitHub, Puppeteer is one of the most used and beginner-friendly options.

Playwright

Playwright is the new kid on the block, also for controlling Chromium, Firefox and WebKit. Key features:

Supports Chrome, Firefox, and Safari
Faster and more reliable than Puppeteer
Easy to install and use across languages
Powerful built-in wait helpers and assertions
19k GitHub stars

We‘ve used Playwright in this guide‘s code examples because of its great APIs.

Selenium

Selenium has been around for over 15 years, and is the most widely used browser automation suite.

Pros:

Supports all major browsers
Very mature and battle-tested
Large community and ecosystem

Downsides:

Only supports JavaScript/TypeScript
API is more cumbersome than Puppeteer or Playwright

Selenium is indispensable for cross-browser testing, but Puppeteer and Playwright are often easier for web scraping.

Custom Browser Scripts

You can also control browsers directly using raw browser APIs and languages like JavaScript. This gives the most flexibility but requires more effort than leveraging libraries like Puppeteer.

Approaches like Chrome Puppeteer and Chromeless have emerged to simplify working directly with the Chrome DevTools Protocol.

Headless Browser Services

Services like Apify, Diffbot, and ProxyCrawl provide cloud-based headless solutions that handle dynamic scraping for you.

The benefit is reduced DevOps and infrastructure overhead since they manage the browsers and proxies. But less control compared to running your own scripts.

Key Takeaways

Let‘s recap what we covered:

Dynamic pages update content after the initial load by fetching data from the backend
Static pages serve all content in the first HTML response
Identifying dynamic pages:
- Built with JavaScript frameworks
- Lazy loading content
- New network requests on interaction
- DOM elements updating
- Disabling JavaScript changes the page
Scraping static pages is simple, but dynamic pages require browser automation
Puppeteer, Playwright, Selenium, and Chrome DevTools are commonly used for scraping dynamic JavaScript sites
Understanding if a page is static or dynamic determines the best scraping approach

I hope this guide gave you a comprehensive understanding of dynamic web pages and how they affect web scraping. Let me know if you have any other questions!

Happy scraping!