Skip to content

subsequent requests will use the proxies defined on the session

Introduction

If you‘re doing any kind of web scraping or automated interaction with websites, chances are you‘ll need to use proxies at some point. A proxy acts as an intermediary between your computer and the internet, making requests on your behalf. There are several key benefits to routing your requests through a proxy server:

  • Anonymity – The target website will see the proxy‘s IP address instead of yours, helping keep your identity private.
  • Security – Proxies provide an additional layer of protection between your machine and the internet.
  • Bypassing restrictions – If a website has blocked your IP address, you can use proxies from a different location to regain access. Proxies are also useful for circumventing regional content blocks and censorship.

In this guide, we‘ll walk through how to use proxies with the popular Python requests library, including how to rotate through multiple proxy IP addresses to avoid detection and bans while scraping. Let‘s get started!

Python requests proxy illustration

Prerequisites

Before we dive in, make sure you have the following:

  • Python 3 installed on your local machine
  • The requests library installed

You can check if requests is already installed by opening a terminal and running:

pip freeze

Look through the list of packages to see if requests is there. If not, you can install it by running:

pip install requests

Using a Proxy with Python Requests

Now let‘s see how to actually use a proxy when making HTTP requests with Python. First, make sure to import the requests library at the top of your script:

import requests

Next, we need to define the proxy servers we want to route our requests through. Create a proxies dictionary that maps protocols to proxy URLs like this:

proxies = { 
    ‘http‘: ‘http://user:[email protected]:8080‘,
    ‘https‘: ‘http://user:[email protected]:8080‘
}

Here we‘re specifying different proxy servers for HTTP and HTTPS connections, along with the credentials needed to authenticate. The URL format is:

protocol://user:password@host:port

If your proxy doesn‘t require authentication, you can omit the user:pass portion.

To use these proxies, simply pass the proxies argument when making a request:

response = requests.get(‘http://example.com‘, proxies=proxies)

This uses the proxies defined in the proxies dict based on the protocol of the target URL. All the standard request methods are supported:


response = requests.get(url, proxies=proxies)  
response = requests.post(url, data=payload, proxies=proxies)
response = requests.put(url, data=payload, proxies=proxies) 
response = requests.patch(url, data=payload, proxies=proxies)
response = requests.delete(url, proxies=proxies)

If you find yourself making many requests using the same proxies, you can avoid repetition by using a Session object. Sessions allow you to persist certain parameters across requests, like cookies and proxies:


session = requests.Session()
session.proxies = {
    ‘http‘: ‘http://user:[email protected]:8080‘, 
    ‘https‘: ‘http://user:[email protected]:8080‘
}

response = session.get(‘http://example.com‘)

For convenience, you can also set your proxy URLs as environment variables:


import os

os.environ[‘HTTP_PROXY‘] = ‘http://user:[email protected]:8080‘ os.environ[‘HTTPS_PROXY‘] = ‘http://user:[email protected]:8080

Then you can omit the proxies argument when making requests and it will automatically apply the environment proxies.

Finally, to access the response data from your proxied request:


response.text  # response body as string
response.content  # response body as bytes 
response.json()  # parse response body as JSON

Rotating Proxies

When scraping a website, using the same proxy repeatedly can quickly get your IP address blocked. To circumvent this, you can rotate through a pool of proxy servers, making each request from a different IP address.

Here‘s a basic script to randomly select a proxy from a list for each request:


import requests
import random

proxies = [ {‘http‘: ‘http://user:[email protected]:8080‘}, {‘https‘: ‘http://user:[email protected]:8080‘}, {‘http‘: ‘http://user:[email protected]:8080‘}, {‘https‘: ‘http://user:[email protected]:8080‘} ]

def random_proxy(): return random.choice(proxies)

for i in range(10): proxy = random_proxy()

try:
    print(f‘Request #{i}, using proxy {proxy}‘)
    response = requests.get(‘http://httpbin.org/ip‘, proxies=proxy, timeout=5) 
    print(response.json())
except Exception as e:
    print(f‘Request failed: {e}‘)

This selects a random proxy from the list for each request. The timeout argument specifies the number of seconds to wait for a response before giving up, which is useful when some proxies in your pool may be unresponsive. We wrap each request in a try/except to catch any errors that may occur.

Keep in mind that free proxy lists often contain many outdated or non-functional proxies. For production scraping, it‘s usually worth paying for a private proxy service that offers a large, reliable pool of IP addresses to maximize your success rate.

Using ScrapingBee‘s Proxy Mode

If you don‘t want to deal with the hassle of finding and configuring proxies yourself, ScrapingBee‘s Proxy Mode provides an easy alternative. It‘s a proxy frontend for the ScrapingBee API that allows you to funnel requests through their proxy servers.

You‘ll first need to sign up for a free ScrapingBee account to get an API key. Then you can make proxied requests by specifying your API key in the proxy URL:


import requests

proxies = { ‘http‘: ‘http://YOUR_API_KEY:[email protected]:8886‘, ‘https‘: ‘http://YOUR_API_KEY:[email protected]:8887‘ }

response = requests.get(‘http://httpbin.org/ip‘, proxies=proxies, verify=False) print(response.text)

The render_js and premium_proxy parameters are optional API flags. See the ScrapingBee API documentation for the full list of available options.

Note the verify=False argument to disable SSL verification, which is required when using ScrapingBee‘s proxies.

With ScrapingBee‘s Proxy Mode, you get access to a large pool of reliable, fast proxies managed by their service, with 1000 free API calls to start. This allows you to offload the complexities of proxy rotation and focus on your scraping logic.

Conclusion

You should now have a solid understanding of how to use proxies with Python‘s requests library for anonymous and efficient web scraping. A few key takeaways:

  • Proxies help keep your scraping undetected by masking your true IP address.
  • Rotating proxies further reduces the chance of your scrapers getting blocked.
  • Elite anonymous proxies are best for avoiding detection, while transparent proxies should generally be avoided.
  • Using a managed proxy service like ScrapingBee can save a lot of time and hassle versus maintaining your own proxy pools.

I encourage you to try applying these techniques to your own scraping projects. Start by making a few test requests through different proxy servers and verifying the IP address. Then set up a basic rotation script to cycle through all your available proxies.

With proxies in your toolkit, you‘ll be able to scrape larger volumes of data from more sources without triggering bans or CAPTCHAs. The next step is learning how to inspect response headers and handle different types of authentication. But you‘re now well on your way to becoming a professional web scraper!

Happy scraping!

Join the conversation

Your email address will not be published. Required fields are marked *