Screen scraping, also known as screen grabbing or UI scraping, is a powerful technique for extracting visual data from websites and applications. Rather than parsing underlying HTML or accessing structured data via an API, screen scraping works by capturing the visible pixels on the screen, just like taking a screenshot.
This approach opens up new possibilities compared to traditional web scraping. With screen scraping, you can:
- Visually detect changes and updates to web pages
- Extract text and tables from legacy systems that don‘t have an API
- Automate interactions with websites and software UIs
- Monitor graphical dashboards and metrics
- Verify that content appears correctly to end users
Though screen scraping has many applications, it can be difficult to implement robustly on your own. Capturing high-fidelity screenshots requires running browsers in a realistic environment. You also have to handle things like varying screen resolutions, graphical glitches, anti-bot measures, and unreliable website availability.
Fortunately, there are tools that handle the hard parts of screen scraping for you. In this tutorial, you‘ll learn how to do screen scraping the easy way using ScrapingBee, a web scraping API that takes care of browsers, proxies, retries, and CAPTCHAs behind the scenes. With just a few lines of code, you‘ll be able to capture screenshots of entire web pages, specific page sections, and more.
Let‘s get started!
Creating a ScrapingBee Account
To begin, head over to ScrapingBee and sign up for a free account. You can register using your email address, Google account, or GitHub.
Once you‘ve signed up, check your inbox and click the confirmation link in the welcome email from ScrapingBee. This will take you to your dashboard, where you can view your API key and current usage.
ScrapingBee gives you 1,000 free API credits to start, which is more than enough for experimenting with screen scraping. Keep the dashboard open so you can copy your API key, which you‘ll need to authenticate your requests later.
Setting Up the ScrapingBee SDK
ScrapingBee provides SDKs for many popular programming languages, including Python, Node.js, Ruby, and more. For this tutorial we‘ll use the Python SDK, but the same functionality is available in all of them.
First, make sure you have Python 3.6 or higher installed. You can check your Python version by running:
python --version
Next, install the scrapingbee
package using pip:
pip install scrapingbee
Now open up a Python shell or create a new .py
file and import the package to verify that the installation worked:
from scrapingbee import ScrapingBeeClient
If you don‘t see any errors, you‘re ready to move on to taking some screenshots!
Capturing Page Screenshots
ScrapingBee makes programmatic screen scraping a breeze by offloading the browser automation work to its API. To take a screenshot, all you need to do is send an HTTP request with the URL of the page you want to capture.
Here‘s a minimal example that takes a screenshot of the Wikipedia article on data scraping:
from scrapingbee import ScrapingBeeClient
client = ScrapingBeeClient(api_key=‘YOUR_API_KEY‘)
response = client.get(
‘https://en.wikipedia.org/wiki/Data_scraping‘,
params={
‘screenshot‘: True, # Enable screenshots
}
)
with open(‘screenshot.png‘, ‘wb‘) as f:
f.write(response.content)
Make sure to replace ‘YOUR_API_KEY‘
with your actual API key from the ScrapingBee dashboard.
This code does a few simple things:
- Creates a
ScrapingBeeClient
instance and authenticates with your API key - Sends an HTTP GET request to the Wikipedia article URL, with the
screenshot
parameter set toTrue
- Saves the binary screenshot data from the response to a file named
screenshot.png
Here‘s what the resulting screenshot looks like:
By default, ScrapingBee captures the full visible portion of the web page as rendered in a browser viewport of 1920×1080 pixels. The exact height of the screenshot will vary depending on the page content.
Screenshotting Specific Page Sections
In many cases, you only care about screenshotting part of a page rather than the whole thing. ScrapingBee lets you target specific page sections using CSS selectors, which is a huge time saver.
For example, let‘s say you want to take a screenshot of just the main content area of the Wikipedia article, without the navigation header, sidebars, etc. Using your browser‘s developer tools, you can inspect the page and find the CSS selector for the content div:
In this case, the selector is ‘#bodyContent‘
. You can pass this to ScrapingBee using the screenshot_selector
parameter:
response = client.get(
‘https://en.wikipedia.org/wiki/Data_scraping‘,
params={
‘screenshot‘: True,
‘screenshot_selector‘: ‘#bodyContent‘,
}
)
Now the screenshot will be tightly cropped to just the main article text:
This technique is super handy for focusing on the parts of the page that are relevant to your project. It also keeps the size of the screenshots manageable if you‘re capturing a large number of them.
Screenshotting the Full Scrolling Page
The default screenshot only grabs the currently visible area of the page. But often, the most interesting content is "below the fold" and requires scrolling to see.
ScrapingBee makes it easy to extend the screenshot to the full scrollable height of the page. Just add the screenshot_full_page
option:
response = client.get(
‘https://en.wikipedia.org/wiki/Data_scraping‘,
params={
‘screenshot‘: True,
‘screenshot_full_page‘: True,
}
)
The resulting screenshot will include the entire article from top to bottom:
Note that full page screenshots can get quite tall for long pages! In this case, the dimensions are 1920×11818 pixels. Make sure you have enough storage space if you‘re capturing many full-length screenshots.
Benefits of Screen Scraping with ScrapingBee
As you can see, ScrapingBee‘s screenshot capabilities are robust and flexible. But there are many other good reasons to use ScrapingBee for screen scraping projects:
Built-in Browser Rendering
ScrapingBee takes care of spinning up headless browsers to render JavaScript-heavy pages and Single Page Apps. You don‘t have to install or configure tools like Puppeteer or Selenium.
Handles Anti-Bot Measures
Many sites try to block web scrapers using CAPTCHAs, bot detection scripts, IP rate limiting, and other measures. ScrapingBee automatically solves these roadblocks to make sure your screen scrapes keep working.
Automatic Retries and Proxying
Unreliable websites, network issues, and anti-bot measures can cause requests to fail unpredictably. ScrapingBee automatically retries failed requests and rotates IP addresses to maintain high availability.
Pay Only for Successful Screenshots
ScrapingBee only charges you credits for successful 2XX responses—not for failed requests or non-2XX status codes. This keeps costs down as you‘re developing and debugging your screen scraping pipelines.
Supports Programmatic Interactions
Beyond screenshots, ScrapingBee lets you interact with pages by clicking, typing, scrolling, and waiting for elements to appear. You can automate multi-step workflows involving forms, searches, pagination, and more.
Next Steps
In this tutorial, you learned the basics of capturing screenshots with ScrapingBee. But there‘s a lot more you can do by combining screenshots with ScrapingBee‘s other features:
- Run screenshots on a schedule to track visual changes to pages over time
- Crop and manipulate screenshot images using the
screenshot_options
parameter - Link screenshots to structured data extraction jobs for more flexible scraping
- Sign up for a paid plan to get more monthly credits and concurrent requests
- Explore the ScrapingBee docs to learn about cookies, AJAX rendering, geotargeting, and more
If you‘re building a serious screen scraping pipeline, ScrapingBee is the way to go. It provides all the benefits of browser automation without the hassles of maintaining your own scraping infrastructure.
To dive deeper, check out the ScrapingBee documentation and try out your first 1,000 free API credits. Happy scraping!