How to Fix the TooManyRedirects Error in Python Requests

If you‘ve done any amount of web scraping with Python, chances are you‘ve encountered the dreaded TooManyRedirects error when making HTTP requests using the popular requests library. This error can be frustrating to debug, but by understanding what causes it and some solutions to handle it, you‘ll be able to scrape even the most stubbornly redirecting websites.

In this guide, we‘ll take an in-depth look at the TooManyRedirects error – what it means, common reasons it occurs, and several solutions you can use in your own web scraping projects to elegantly avoid or handle it. We‘ll walk through detailed code examples for each solution so you can easily apply them yourself.

What is the TooManyRedirects Error?

First, let‘s clarify what this error actually means. As the name suggests, the TooManyRedirects error gets raised by requests when it encounters too many HTTP redirects in a row when trying to fetch a URL.

Specifically, if requests gets caught in a loop where the URL it‘s trying to access keeps redirecting to other URLs, and this redirection process goes on for more than 30 hops, requests will eventually give up, stop following the redirects, and raise a TooManyRedirects exception like this:

requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

The key points here are:

requests has a built-in limit of following a maximum of 30 redirects per request
If a request hits that limit and still hasn‘t arrived at a final destination URL, requests stops and raises TooManyRedirects

So in a nutshell, this error means the URL you‘re trying to access is redirecting too many times. But the question is – why? Let‘s look at some common culprits.

Why You‘re Seeing TooManyRedirects

There are a few different reasons you might be getting the TooManyRedirects error in your web scraping code:

1. The website has a redirection bug

One possibility is the website you‘re trying to scrape simply has a bug in how it handles redirects. It may be stuck in an infinite redirection loop due to a coding error.

This is actually pretty rare – most websites will resolve redirects in just a few hops. So if you‘re consistently hitting TooManyRedirects, this probably isn‘t the reason, but it‘s worth mentioning as a possibility. If you suspect this is the case, you might want to let the website owner know about the bug!

2. The website is intentionally redirecting in a loop

Another possibility is the website has intentionally set up a redirection loop in order to block automated requests, like those coming from a web scraper.

Some websites don‘t want their content to be scraped, so they implement mechanisms to detect and block suspicious traffic, such as:

Checking request headers like User-Agent to determine if traffic is coming from a real browser or an automated script
Limiting the number or rate of requests coming from a single IP address
Using CAPTCHAs or JavaScript challenges that are hard for scrapers to solve

If a website suspects a request is coming from a bot, a common tactic is to send it into an infinite redirection loop to frustrate the bot and prevent it from accessing the real content. This could explain why your scraper is hitting TooManyRedirects.

3. You‘re making too many valid redirects

The last common reason is your scraper is simply following too many valid redirects and hitting the default limit of 30.

For example, maybe the website you‘re scraping has a series of related pages that each link to the next in a chain, and you need to follow that whole chain to get to the final destination page.

Or perhaps the site uses redirects to track clicks and you need to follow them to avoid being detected as a bot.

In cases like these, 30 redirects may not be enough, so even though the redirects are valid and not infinite loops, requests is still hitting its limit and throwing TooManyRedirects.

How to Handle TooManyRedirects in Python Requests

Now that we know why TooManyRedirects might be happening, let‘s go over some ways to solve it in Python using requests.

Solution 1: Check for bugs or blocking

The first thing to do is simply check if the site you‘re trying to scrape has any obvious redirection bugs or is intentionally sending you in circles to block your scraper.

Open the URL in your regular web browser and see what happens. If you immediately get stuck in a redirect loop, there‘s probably a website bug. If you don‘t, the site may be detecting your scraper and blocking it with redirects.

One way to check is to try changing your scraper‘s User-Agent header to mimic a normal web browser. For example:

import requests

headers = {
    ‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36‘
}

response = requests.get(‘https://example.com‘, headers=headers)

If setting a browser-like User-Agent stops the TooManyRedirects error, you‘ll know the site was blocking your scraper. In that case, you may need more advanced techniques to avoid detection, like rotating User-Agents and IP addresses.

If you still get TooManyRedirects even when mimicking a browser, the site likely has a real redirection bug that you can‘t do much about. You‘ll have to find workarounds like only scraping the pre-redirection URLs.

Solution 2: Increase max_redirects

If you believe the redirects you‘re encountering are valid and you simply need to follow more than 30 of them to reach the destination pages, you can increase the max_redirects limit.

To do this, use a requests.Session() instead of the top-level requests.get() function, and set the max_redirects option to a higher number:

import requests

session = requests.Session() 
session.max_redirects = 50  # Follow up to 50 redirects per request

response = session.get(‘https://example.com‘)

By using a Session, you can customize settings like max_redirects that will persist across requests. So the above code will allow following up to 50 valid redirects.

Solution 3: Examine redirect URLs for patterns

If increasing max_redirects still isn‘t enough, or you want to avoid wasting time following pointless redirect chains, you can inspect the URLs you‘re being redirected to and see if there are any patterns you can extract.

For example, let‘s say the redirect URLs contain an increasing ‘page‘ parameter like this:

https://example.com/article?page=1
https://example.com/article?page=2
https://example.com/article?page=3
...

Instead of following each redirect, you could detect that page pattern and generate the later URLs yourself, skipping right to the end of the chain.

To inspect the redirect URLs, use a Session again and enable allow_redirects=False. This will let you see the intermediate URLs requests would have been redirected to:

import requests

session = requests.Session() 
session.max_redirects = 50

response = session.get(‘https://example.com‘, allow_redirects=False)

redirect_url = response.headers[‘Location‘]
print(redirect_url)

The Location header of the response will contain the URL requests would have redirected to next if allow_redirects was True. You can examine that URL, extract any patterns, and use them to optimize your requests.

Solution 4: Disable redirects entirely

If the redirects are completely unnecessary for your scraping goals, you can avoid TooManyRedirects by not following redirects at all.

Again using a Session, set allow_redirects=False as shown above. This will cause requests to return the very first response it receives, without following any redirection instructions.

The response status code will likely be a redirection code in the 3xx range, like 301 Permanent Redirect or 302 Temporary Redirect, but the response body will contain the content of the initial pre-redirection URL.

If that initial content is all you need, this is an easy way to avoid redirect issues entirely.

Solution 5: Catch the TooManyRedirects exception

Finally, if you still want to follow redirects but aren‘t sure how many there will be, you can just attempt to follow them all and catch the TooManyRedirects exception if it happens.

Wrap your request in a try/except block to catch the exception:

import requests
from requests.exceptions import TooManyRedirects

try:
    response = requests.get(‘https://example.com‘)
except TooManyRedirects:
    print("Hit TooManyRedirects limit!")

In the except block, you can choose to either fail gracefully, or retry the request with different parameters like a higher max_redirects limit.

Conclusion

In this guide, we took a deep dive into the TooManyRedirects error in Python requests. We covered:

What it means and when it happens
Common causes like website bugs, anti-scraping measures, and valid long redirect chains
Five solutions to avoid or handle it:
1. Check for bugs or blocking
2. Increase max_redirects
3. Analyze redirect URLs for patterns
4. Disable redirects with allow_redirects=False
5. Catch the TooManyRedirects exception

Armed with this knowledge, you should be well-equipped to scrape even the most redirect-heavy websites without your code getting trapped in endless loops.

The key lessons are to use a requests.Session() for persistent configuration, adjust max_redirects and allow_redirects as needed, inspect redirect URLs for optimizable patterns, and don‘t be afraid to catch exceptions when all else fails.

I hope this guide has been helpful for understanding and resolving TooManyRedirects in your web scraping projects. Happy scraping!