Skip to content

How to block image loading in Selenium? | ScrapingBee

How to Block Image Loading in Selenium for Faster Web Scraping

When scraping websites using Selenium, one of the most effective ways to drastically reduce page load times and improve performance is to block images from loading. Since images and graphics typically make up over 60% of the average web page‘s total payload size, not downloading them can cut data transfer and rendering times significantly.

Consider these statistics on the impact of images on web page speed and size:

  • Images comprise 54% of the average web page‘s total size (source: HTTP Archive)
  • Pages with a large number of images or high-resolution graphics can easily exceed 5MB in size
  • The average page makes over 50 HTTP requests for images alone (source: KeyCDN)
  • Blocking images can reduce page load times by up to 65% (source: Shopzilla case study)

By eliminating all those image requests and data transfers, your Selenium scraper can load and process pages much faster, saving valuable time and bandwidth.

The good news is disabling images in Selenium is a relatively straightforward process. You just need to configure the underlying browser driver to not download and render images when loading web pages. The exact steps vary slightly depending on which browser you are automating with Selenium.

Blocking Images in Chrome with Selenium

Google Chrome is one of the most popular browsers for web scraping due to its speed, customizability, and extensive developer tools. To block images when scraping with Chrome and Selenium:

  1. Create a new instance of the ChromeOptions class to specify custom browser settings
  2. Set the "profile.managed_default_content_settings.images" experimental option to 2, which disables all images
  3. Pass the configured ChromeOptions object when initializing the Chrome WebDriver

Here‘s the code to launch Chrome with images disabled:

from selenium import webdriver

# Configure Chrome to block images
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)

# Create a new Chrome WebDriver instance with the image-blocking options 
driver = webdriver.Chrome(options=chrome_options)

Now when you use the driver to navigate to a URL, Chrome will block all images from loading:

driver.get("https://example.com")  # Load page without images

Blocking Images in Firefox with Selenium

Mozilla Firefox is another versatile browser choice for Selenium automation. The process for disabling images in Firefox is similar to Chrome, with a few differences in the options configuration syntax:

  1. Create an instance of the FirefoxOptions class
  2. Use the set_preference() method to set the "permissions.default.image" preference to 2, which blocks all images
  3. Pass the configured FirefoxOptions when creating the Firefox WebDriver
from selenium import webdriver

# Set Firefox to block images
firefox_options = webdriver.FirefoxOptions()
firefox_options.set_preference("permissions.default.image", 2)

# Launch Firefox with the custom image-blocking options
driver = webdriver.Firefox(options=firefox_options)

Then use the driver‘s get() method as usual to load pages without images:

driver.get("https://example.com")

Blocking Images in Safari with Selenium

For Mac users, Safari is another option for Selenium web automation. Disabling images in Safari requires setting a custom preference in the SafariOptions:

  1. Create a SafariOptions instance
  2. Set the "com.apple.Safari.ContentPageGroupIdentifier.WebKit2ImagesEnabled" preference to False
  3. Include the SafariOptions when launching the Safari WebDriver
from selenium import webdriver

# Configure Safari to not load images
safari_options = webdriver.SafariOptions()
safari_options.set_preference("com.apple.Safari.ContentPageGroupIdentifier.WebKit2ImagesEnabled", False)

# Create Safari WebDriver with image blocking enabled
driver = webdriver.Safari(options=safari_options)

You can now use Selenium to scrape pages in Safari without images.

Risks and Downsides of Blocking Images

While disabling images can provide a substantial speed boost, there are certain tradeoffs and potential issues to be aware of:

  • Some pages may render incorrectly or break without images, particularly if key UI elements like buttons, navigation, or headings are implemented as graphics
  • JavaScript that relies on image loading events may fail, causing dependent functionality to break
  • Essential images that contain content you need to scrape will obviously not be available

If you encounter a page that does not function properly with images disabled, you can mitigate the issue by:

  • Increasing the page load timeout to allow more time for the page to render and stabilize before interacting with it. Use WebDriver‘s set_page_load_timeout() method:
driver.set_page_load_timeout(30)  # Wait up to 30 seconds for pages to load
  • Explicitly waiting for elements to be present and visible before interacting with them, using Selenium‘s WebDriverWait and expected_conditions:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)
element = wait.until(EC.presence_of_element_located((By.ID, "my-element")))

This ensures the page has fully loaded and rendered before proceeding.

Image Blocking vs. Headless Mode

Another popular technique for speeding up Selenium is running browsers in headless mode. Headless mode works by launching an invisible browser instance without any UI rendering. This typically makes pages load faster, but with some key differences from blocking images:

Blocking ImagesHeadless Mode
Requests for images never sent, reducing network trafficAll resources including images still downloaded
Can be toggled on/off dynamically for specific pagesMode cannot be changed once browser is launched
UI still renders, but may be missing visual elementsNo UI rendered at all, reducing CPU/memory usage
Specific image types/sizes can be blocked selectivelyAll page resources loaded with no filtering options

So which one should you use? It depends on your needs and scraping environment. If you‘re dealing with image-heavy pages and limited bandwidth, blocking images may be more effective. If you‘re running many concurrent scraper threads, headless mode‘s lower resource usage can help.

You can also combine both approaches for the best performance. Launch browsers in headless mode and block images for the ultimate speed and efficiency.

To use headless mode, add the –headless argument to your WebDriver options:

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")

Advanced Image Blocking Techniques

For greater control over which images and graphics are loaded, you can fine-tune your image blocking preferences:

  • Block only third-party images from external domains by setting the "profile.default_content_settings.images" preference to 1
  • Disable images larger than a certain file size to skip high-resolution photos while still loading small graphics
  • Whitelist specific domains to allow images you need, while blocking the rest

Here‘s how you might block only large images over 50KB in size:

chrome_options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
prefs["profile.managed_default_content_settings.images"] = 1  # Enable image loading
prefs["profile.managed_image_size_kb_limit"] = 50  # Set size limit in KB
chrome_options.add_experimental_option("prefs", prefs)

Verifying Images Are Blocked

To confirm your image blocking settings are working as intended, use your browser‘s Developer Tools to inspect the network activity when loading a page with Selenium:

  1. Open the Network panel and filter by the Img tab to view only image requests
  2. Load the page using Selenium with image blocking enabled
  3. Check that image requests are not present or have 0 byte sizes

You can also examine the actual rendered page HTML through Selenium to verify img tags do not have src attributes populated:

images = driver.find_elements(By.TAG_NAME, "img")
for image in images:
    print(image.get_attribute("src"))  

If you see only blank or placeholder values instead of actual image URLs, your blocking is working.

Remember, if you need to re-enable image loading for a specific page or element after disabling it globally, you can reset your image preferences:

chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option("prefs", {})  # Reset preferences to default
driver = webdriver.Chrome(options=chrome_options) 

Other Web Scraping Optimization Tips

While blocking images is one of the most impactful ways to speed up your Selenium scrapers, there are many other techniques to improve performance and reliability:

  • Use explicit waits instead of hardcoded pauses to avoid long delays and ensure elements are ready to be interacted with
  • Disable browser extensions, plugins, and animations which can bog down automation and cause unexpected behavior
  • Choose the fastest and most stable WebDriver for your browser and operating system (e.g. ChromeDriver for Chrome on Windows)
  • Use efficient element locator strategies like CSS selectors and IDs over slower XPath queries
  • Implement parallel scraping by launching multiple browser instances to scrape pages concurrently
  • Optimize your scraping code to transfer only the essential data you need between the browser and your script
  • Set a page load timeout to avoid hanging when encountering unresponsive or slow pages
  • Monitor and log metrics like page load times, resource usage, and error rates to identify performance bottlenecks

By combining image blocking with headless mode and these other web scraping best practices, you can dramatically reduce the run time and resource consumption of your Selenium spiders. Faster scrapers allow you to extract data more efficiently at greater scale.

So try implementing image blocking in your scrapers today and see just how much it speeds things up! With a little configuration and experimentation, you‘ll be amazed at the performance gains possible.

Join the conversation

Your email address will not be published. Required fields are marked *