Skip to content

What exactly is a dynamic page?

Hey there! Dynamic web pages can seem confusing at first glance. My name‘s [John] and I‘ve been working with web scrapers for over 5 years, so I‘m here to break down exactly what dynamic pages are all about.

Whether you‘re new to web scraping or a seasoned pro, understanding what makes a page "dynamic" is key for being able to scrape it effectively. By the end of this guide, you‘ll know how to identify dynamic pages, why they matter for web scraping, and how to approach scraping them.

Static vs. Dynamic Pages

To understand what a dynamic page is, it helps to first look at the opposite – a static page.

A static web page has content that is pre-defined and does not change after the initial page load. All HTML, CSS, JavaScript, images, and other assets are loaded from the server when the page first renders.

According to Google‘s Web Fundamentals documentation, static pages have two key qualities:

  • They display the same content for all users.
  • The content doesn‘t change based on user interaction.

Some examples of static pages:

  • Simple HTML sites without much interactivity
  • Old school informational sites like Wikipedia
  • Classic blogs and websites

A dynamic web page, on the other hand, updates its content after the initial load, without having to completely refresh the page. This is achieved by making additional requests (typically XHR or Fetch requests using JavaScript) to retrieve more data from the backend on-demand.

That new data can then be used to modify the DOM and update what you see on the page.

According to the MDN Web Docs, some signs of a dynamic page are:

  • The page content changes without reloading the page
  • DOM elements are added, removed, or manipulated
  • Additional data is downloaded and interacted with

Common examples of dynamic pages:

  • Modern interactive web apps like Twitter, Facebook, etc.
  • Sites with infinite scrolling feeds like Instagram, Reddit, Pinterest
  • Ecommerce product listings that lazy load items
  • Pages built with JavaScript frameworks like React, Vue, Angular

So in summary, a dynamic page can update its content and DOM after the initial load by making additional requests to retrieve data from the backend. Pretty much any modern web app will have some level of dynamic loading.

Identifying Dynamic Pages

Now that you understand the difference between static and dynamic pages conceptually, let‘s go over some clear signs that indicate a page is dynamic:

1. Built with a JavaScript framework

If a site is built using React, Vue, Angular, Svelte, or any other frontend JavaScript framework – it‘s most likely a dynamic page.

This 2021 survey from State of JS showed the popularity of frontend frameworks:

Framework Usage
React 88.4%
Vue.js 55.1%
Angular 50.3%

You can identify if a site uses React by installing the React Developer Tools browser extension, which will highlight React components on the page.

2. Lazy loading content

If a page only loads assets like images, videos, or elements like posts and comments when they are scrolled into view – that‘s a form of dynamic loading.

The content is only retrieved from the backend when needed, as the user is viewing that part of the page. Lazy loading improves performance.

3. New network requests on interaction

Open up the Network tab in Chrome Developer Tools. Reload the page and watch the requests – if additional XHR or Fetch requests are made when you scroll, click buttons, hover over elements, etc. then dynamic content is likely being loaded.

For example, here are the new requests made when scrolling on Twitter:

Twitter network requests demo

Each additional request returns more tweets to be rendered on the page.

4. Updating DOM elements

Inspect elements on the page before and after interacting with it. If the DOM changes at all – new elements are added, element attributes change, nodes are removed – that indicates dynamic modifications.

For example, here is the Twitter DOM updating as I scroll through my feed:

Twitter DOM changes demo

New tweets are injected into the DOM dynamically as they are fetched from the backend.

5. Disabling JavaScript

An easy way to test if a page relies on JavaScript for dynamic loading is to temporarily disable JS in your browser (using an extension like Quick JavaScript Switcher) and reload.

If the page content changes significantly with JavaScript off, that‘s a sign it uses JS to load content dynamically.

Examples of Dynamic Pages

Let‘s look at some real world examples to make identifying dynamic pages more concrete.

Twitter

We already used Twitter as an example earlier in the article. It‘s a quintessential dynamic web app, with infinite scrolling, lazy loading content, and DOM updates.

Some clear signs Twitter is dynamic:

  • Built with React
  • Lazy loads images/videos
  • New network requests when scrolling
  • DOM updates with new tweets

Here is a recap video highlighting the dynamic behavior:

Twitter Dynamic Page Loading Demo

Facebook

Facebook is another prime example of a dynamic web app:

  • Infinite scrolling feeds
  • Lazy loading images/videos
  • Additional network requests when scrolling
  • DOM updates with new posts

Here is a video showing the dynamic loading on Facebook:

Facebook Dynamic Loading Demo

Reddit

Reddit is an extremely dynamic site, being one of the pioneers of infinite scrolling feeds. Some signs Reddit is dynamic:

  • Built with React
  • Loads new posts constantly as you scroll
  • Images lazy load
  • Tons of new network requests when scrolling

You can see the dynamic loading yourself by opening Reddit in your browser.

Amazon

Even Amazon‘s product listings have dynamic loading behavior:

  • Additional products load when scrolling
  • Network requests fetch the new products
  • DOM updates with the new HTML

Here is an example – note how the scrollbar jumps back up when new products are loaded:

Amazon Dynamic Product Loading

This improves performance by only loading products as they are viewed.

Medium

Blogging platforms like Medium have dynamic loading so articles load faster:

  • Images lazy load as you scroll
  • The body text loads in chunks
  • Additional network requests retrieve more text

You can observe this behavior on any article on Medium.

Pinterest

Pinterest is of course full of infinite scrolling dynamic loading:

  • Pins lazy load as you scroll
  • New network requests fetch more pins
  • The DOM updates with the new pins

Very similar behavior to other social media feeds.

Why It Matters for Web Scraping

Now that you know how to recognize dynamic web pages, let‘s discuss why it matters when scraping.

Understanding if a page is static vs dynamic affects the tools and techniques required to build an effective scraper.

Scraping Static Pages

For static web pages, we can simply:

  1. Make a single HTTP request to fetch the full HTML
  2. Parse the HTML content in our programming language of choice
  3. Extract any data we need with a parsing library like Beautiful Soup in Python or Cheerio in Node.js

For example, here is a simple Python scraper using Requests and Beautiful Soup to scrape a static Wikipedia page:

import requests
from bs4 import BeautifulSoup

url = ‘https://en.wikipedia.org/wiki/Hippopotamus‘ 

response = requests.get(url)
soup = BeautifulSoup(response.text, ‘html.parser‘)

title = soup.find(‘h1‘).text
infobox = soup.find(‘table‘, class_=‘infobox‘)

print(title)
print(infobox)

Because Wikipedia serves all content including text, images, tables, etc. in the initial HTML, we can parse the full page content in a simple script.

No need for a headless browser or simulation of user interactions. The static HTML contains all the data we need.

This simple scraping approach works well for many static sites and pages.

Scraping Dynamic Pages

Scraping dynamic pages requires a different approach, because:

  • The initial HTML does not contain all the content – more is loaded dynamically
  • We need to simulate user interactions like scrolling to trigger content loading
  • JavaScript execution is required to handle the dynamic requests

This means we must use a headless browser like Puppeteer, Playwright, or Selenium to:

  1. Load the full JavaScript + CSS to render the initial page correctly
  2. Scroll, click, hover, etc. to trigger dynamic content loading
  3. Wait for network requests to complete and page to update
  4. Extract updated HTML containing dynamically loaded content

For example, here is how to scrape infinite scrolling Twitter feeds with Python + Playwright:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()  
    page.goto(‘https://twitter.com/home‘)

    page.scroll_to_bottom() # trigger dynamic loading
    page.wait_for_load_state(‘networkidle‘) 

    html = page.content() # get full html
    # parse html to extract tweets... 

   browser.close()

The key differences from static scraping:

  • Launching a full Chrome browser
  • Executing scroll commands to trigger content loading
  • Waiting for the network to idle and DOM to update
  • Getting updated HTML after dynamic loading finishes

This browser automation is required to fully render dynamic pages before scraping.

Static vs. Dynamic Scraping

Here is a comparison of static versus dynamic page scraping:

Static Page Scraping Dynamic Page Scraping
Tools Requests, Beautiful Soup, Cheerio Puppeteer, Playwright, Selenium
Methods Single HTTP request, parse HTML Browser automation, interaction simulation, wait for network idle
Performance Very fast, minimal overhead Slower due to browser load and action delays
JavaScript Needed? No Yes, required to execute dynamic actions

Understanding these key differences will help you choose the right approach for each scraping project.

Tools for Scraping Dynamic Pages

Now let‘s discuss some of the most popular tools for scraping dynamic JavaScript-heavy sites:

Puppeteer

Puppeteer is a Node.js library developed by Google for controlling headless Chrome. It allows us to:

  • Launch a browser instance
  • Load pages
  • Interact by scrolling, clicking, filling forms
  • Execute JavaScript on the pages
  • Access the DOM and extract HTML

With over 67k stars on GitHub, Puppeteer is one of the most used and beginner-friendly options.

Playwright

Playwright is the new kid on the block, also for controlling Chromium, Firefox and WebKit. Key features:

  • Supports Chrome, Firefox, and Safari
  • Faster and more reliable than Puppeteer
  • Easy to install and use across languages
  • Powerful built-in wait helpers and assertions
  • 19k GitHub stars

We‘ve used Playwright in this guide‘s code examples because of its great APIs.

Selenium

Selenium has been around for over 15 years, and is the most widely used browser automation suite.

Pros:

  • Supports all major browsers
  • Very mature and battle-tested
  • Large community and ecosystem

Downsides:

  • Only supports JavaScript/TypeScript
  • API is more cumbersome than Puppeteer or Playwright

Selenium is indispensable for cross-browser testing, but Puppeteer and Playwright are often easier for web scraping.

Custom Browser Scripts

You can also control browsers directly using raw browser APIs and languages like JavaScript. This gives the most flexibility but requires more effort than leveraging libraries like Puppeteer.

Approaches like Chrome Puppeteer and Chromeless have emerged to simplify working directly with the Chrome DevTools Protocol.

Headless Browser Services

Services like Apify, Diffbot, and ProxyCrawl provide cloud-based headless solutions that handle dynamic scraping for you.

The benefit is reduced DevOps and infrastructure overhead since they manage the browsers and proxies. But less control compared to running your own scripts.

Key Takeaways

Let‘s recap what we covered:

  • Dynamic pages update content after the initial load by fetching data from the backend

  • Static pages serve all content in the first HTML response

  • Identifying dynamic pages:

    • Built with JavaScript frameworks
    • Lazy loading content
    • New network requests on interaction
    • DOM elements updating
    • Disabling JavaScript changes the page
  • Scraping static pages is simple, but dynamic pages require browser automation

  • Puppeteer, Playwright, Selenium, and Chrome DevTools are commonly used for scraping dynamic JavaScript sites

  • Understanding if a page is static or dynamic determines the best scraping approach

I hope this guide gave you a comprehensive understanding of dynamic web pages and how they affect web scraping. Let me know if you have any other questions!

Happy scraping!

Join the conversation

Your email address will not be published. Required fields are marked *