If you‘ve done any amount of web scraping with Python, chances are you‘ve encountered the dreaded TooManyRedirects
error when making HTTP requests using the popular requests
library. This error can be frustrating to debug, but by understanding what causes it and some solutions to handle it, you‘ll be able to scrape even the most stubbornly redirecting websites.
In this guide, we‘ll take an in-depth look at the TooManyRedirects
error – what it means, common reasons it occurs, and several solutions you can use in your own web scraping projects to elegantly avoid or handle it. We‘ll walk through detailed code examples for each solution so you can easily apply them yourself.
What is the TooManyRedirects Error?
First, let‘s clarify what this error actually means. As the name suggests, the TooManyRedirects
error gets raised by requests
when it encounters too many HTTP redirects in a row when trying to fetch a URL.
Specifically, if requests
gets caught in a loop where the URL it‘s trying to access keeps redirecting to other URLs, and this redirection process goes on for more than 30 hops, requests
will eventually give up, stop following the redirects, and raise a TooManyRedirects
exception like this:
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
The key points here are:
requests
has a built-in limit of following a maximum of 30 redirects per request- If a request hits that limit and still hasn‘t arrived at a final destination URL,
requests
stops and raisesTooManyRedirects
So in a nutshell, this error means the URL you‘re trying to access is redirecting too many times. But the question is – why? Let‘s look at some common culprits.
Why You‘re Seeing TooManyRedirects
There are a few different reasons you might be getting the TooManyRedirects
error in your web scraping code:
1. The website has a redirection bug
One possibility is the website you‘re trying to scrape simply has a bug in how it handles redirects. It may be stuck in an infinite redirection loop due to a coding error.
This is actually pretty rare – most websites will resolve redirects in just a few hops. So if you‘re consistently hitting TooManyRedirects
, this probably isn‘t the reason, but it‘s worth mentioning as a possibility. If you suspect this is the case, you might want to let the website owner know about the bug!
2. The website is intentionally redirecting in a loop
Another possibility is the website has intentionally set up a redirection loop in order to block automated requests, like those coming from a web scraper.
Some websites don‘t want their content to be scraped, so they implement mechanisms to detect and block suspicious traffic, such as:
- Checking request headers like
User-Agent
to determine if traffic is coming from a real browser or an automated script - Limiting the number or rate of requests coming from a single IP address
- Using CAPTCHAs or JavaScript challenges that are hard for scrapers to solve
If a website suspects a request is coming from a bot, a common tactic is to send it into an infinite redirection loop to frustrate the bot and prevent it from accessing the real content. This could explain why your scraper is hitting TooManyRedirects
.
3. You‘re making too many valid redirects
The last common reason is your scraper is simply following too many valid redirects and hitting the default limit of 30.
For example, maybe the website you‘re scraping has a series of related pages that each link to the next in a chain, and you need to follow that whole chain to get to the final destination page.
Or perhaps the site uses redirects to track clicks and you need to follow them to avoid being detected as a bot.
In cases like these, 30 redirects may not be enough, so even though the redirects are valid and not infinite loops, requests
is still hitting its limit and throwing TooManyRedirects
.
How to Handle TooManyRedirects in Python Requests
Now that we know why TooManyRedirects
might be happening, let‘s go over some ways to solve it in Python using requests
.
Solution 1: Check for bugs or blocking
The first thing to do is simply check if the site you‘re trying to scrape has any obvious redirection bugs or is intentionally sending you in circles to block your scraper.
Open the URL in your regular web browser and see what happens. If you immediately get stuck in a redirect loop, there‘s probably a website bug. If you don‘t, the site may be detecting your scraper and blocking it with redirects.
One way to check is to try changing your scraper‘s User-Agent
header to mimic a normal web browser. For example:
import requests
headers = {
‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36‘
}
response = requests.get(‘https://example.com‘, headers=headers)
If setting a browser-like User-Agent
stops the TooManyRedirects
error, you‘ll know the site was blocking your scraper. In that case, you may need more advanced techniques to avoid detection, like rotating User-Agent
s and IP addresses.
If you still get TooManyRedirects
even when mimicking a browser, the site likely has a real redirection bug that you can‘t do much about. You‘ll have to find workarounds like only scraping the pre-redirection URLs.
Solution 2: Increase max_redirects
If you believe the redirects you‘re encountering are valid and you simply need to follow more than 30 of them to reach the destination pages, you can increase the max_redirects
limit.
To do this, use a requests.Session()
instead of the top-level requests.get()
function, and set the max_redirects
option to a higher number:
import requests
session = requests.Session()
session.max_redirects = 50 # Follow up to 50 redirects per request
response = session.get(‘https://example.com‘)
By using a Session
, you can customize settings like max_redirects
that will persist across requests. So the above code will allow following up to 50 valid redirects.
Solution 3: Examine redirect URLs for patterns
If increasing max_redirects
still isn‘t enough, or you want to avoid wasting time following pointless redirect chains, you can inspect the URLs you‘re being redirected to and see if there are any patterns you can extract.
For example, let‘s say the redirect URLs contain an increasing ‘page‘ parameter like this:
https://example.com/article?page=1
https://example.com/article?page=2
https://example.com/article?page=3
...
Instead of following each redirect, you could detect that page
pattern and generate the later URLs yourself, skipping right to the end of the chain.
To inspect the redirect URLs, use a Session
again and enable allow_redirects=False
. This will let you see the intermediate URLs requests
would have been redirected to:
import requests
session = requests.Session()
session.max_redirects = 50
response = session.get(‘https://example.com‘, allow_redirects=False)
redirect_url = response.headers[‘Location‘]
print(redirect_url)
The Location
header of the response will contain the URL requests
would have redirected to next if allow_redirects
was True
. You can examine that URL, extract any patterns, and use them to optimize your requests.
Solution 4: Disable redirects entirely
If the redirects are completely unnecessary for your scraping goals, you can avoid TooManyRedirects
by not following redirects at all.
Again using a Session
, set allow_redirects=False
as shown above. This will cause requests
to return the very first response it receives, without following any redirection instructions.
The response status code will likely be a redirection code in the 3xx
range, like 301
Permanent Redirect or 302
Temporary Redirect, but the response body will contain the content of the initial pre-redirection URL.
If that initial content is all you need, this is an easy way to avoid redirect issues entirely.
Solution 5: Catch the TooManyRedirects exception
Finally, if you still want to follow redirects but aren‘t sure how many there will be, you can just attempt to follow them all and catch the TooManyRedirects
exception if it happens.
Wrap your request in a try/except
block to catch the exception:
import requests
from requests.exceptions import TooManyRedirects
try:
response = requests.get(‘https://example.com‘)
except TooManyRedirects:
print("Hit TooManyRedirects limit!")
In the except
block, you can choose to either fail gracefully, or retry the request with different parameters like a higher max_redirects
limit.
Conclusion
In this guide, we took a deep dive into the TooManyRedirects
error in Python requests
. We covered:
- What it means and when it happens
- Common causes like website bugs, anti-scraping measures, and valid long redirect chains
- Five solutions to avoid or handle it:
- Check for bugs or blocking
- Increase
max_redirects
- Analyze redirect URLs for patterns
- Disable redirects with
allow_redirects=False
- Catch the
TooManyRedirects
exception
Armed with this knowledge, you should be well-equipped to scrape even the most redirect-heavy websites without your code getting trapped in endless loops.
The key lessons are to use a requests.Session()
for persistent configuration, adjust max_redirects
and allow_redirects
as needed, inspect redirect URLs for optimizable patterns, and don‘t be afraid to catch exceptions when all else fails.
I hope this guide has been helpful for understanding and resolving TooManyRedirects
in your web scraping projects. Happy scraping!