How to Follow Redirects using cURL: The Ultimate Guide
Introduction
If you‘ve ever used the command line to interact with websites and APIs, you‘ve probably used cURL. cURL is a powerful tool for transferring data using various network protocols. It‘s especially handy for testing APIs, debugging network issues, and automating tasks.
One of cURL‘s many useful features is its ability to automatically follow HTTP redirects. This means if you request a URL that redirects to another location, cURL will detect this and follow the redirection for you. In this guide, we‘ll take an in-depth look at using cURL to handle redirects.
HTTP Redirect Overview
Before we dive into the specifics of cURL, let‘s review what HTTP redirects are and why they‘re used.
An HTTP redirect is a way for a server to tell a client (like your web browser or cURL) that a resource has been moved to a different URL. When a client makes a request to a URL that has been redirected, the server sends back a special response code and the new URL location.
The main types of redirects are:
- 301 Permanent Redirect: Indicates that the resource has been permanently moved to a new URL. Clients should update their links/bookmarks.
- 302 Temporary Redirect: Means the resource is temporarily located at a different URL. The client should continue to use the original URL in the future.
- 307 Temporary Redirect: Similar to 302, but specifically states that the client should not change the HTTP method (POST, GET, etc.) when following the redirect.
- 308 Permanent Redirect: Like 301, but also specifies the HTTP method should not change.
Some common scenarios where redirects are used include:
- Moving a website to a new domain
- Forcing the use of HTTPS for security
- Redirecting mobile users to a mobile-optimized version of a site
- URL shortening services
Using cURL to Follow Redirects
By default, if you make a cURL request to a URL that returns a redirect, cURL will simply report the redirect response code and headers. It won‘t automatically follow the redirection.
To instruct cURL to follow redirects, you simply need to use the -L
or --location
command-line option. Here‘s an example:
curl -L http://example.com
With the -L
option, cURL will detect any redirects and follow them automatically, eventually returning the content of the final URL in the redirect chain.
If we run the command with -v
for verbose output, we can see the redirects happening:
curl -v -L http://httpbin.org/redirect/2
The response will include lines like:
< HTTP/1.1 302 FOUND
< Location: /redirect/1
< HTTP/1.1 302 FOUND
< Location: /get
< HTTP/1.1 200 OK
This shows that cURL first got a 302 redirect to /redirect/1
, then another 302 to /get
, which returned a final 200 OK response.
Configuring cURL‘s Redirect Behavior
cURL provides a number of options to customize how it handles redirects:
--max-redirs NUM
: Sets the maximum number of redirects that cURL will follow. The default is 50.
--proto-redir PROTOCOLS
: Limits which protocols cURL will automatically redirect to. By default it allows all protocols on the initial URL.
--post302
: Forces cURL to maintain the request method after a 302 redirect. Helpful if you want to continue making a POST after redirects.
For example, to allow a maximum of 5 redirects and only allow redirects to HTTPS URLs:
curl --max-redirs 5 --proto-redir https -L http://example.com
Handling Cookies with Redirects
One potential complication with redirects is handling cookies. If a server sets cookies on the initial response or a redirect response, you often need to store those cookies and include them in the redirected request.
With cURL, you can use the -c
option to specify a file to store cookies, and the -b
option to pass those cookies on the next request:
curl -c cookies.txt -L http://example.com
curl -b cookies.txt -L http://example.com
The first command will store any cookies in cookies.txt
. The second command will read the cookies from that file and include them in the request.
Debugging and Troubleshooting
When you‘re working with complicated redirection scenarios, it can be helpful to get more details on what cURL is doing under the hood. We already saw the -v
option for verbose output.
For even more details, you can use --trace
or --trace-ascii
to log a full trace of the request/response data:
curl --trace-ascii trace.log -L http://example.com
This will create a trace.log
file with extensive debugging information about the request and response headers and data.
Real-World Examples
Let‘s walk through a few more practical examples of using cURL with redirects.
Following a redirect to the mobile/desktop version of a site:
# Desktop user-agent
curl -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:93.0) Gecko/20100101 Firefox/93.0" -L https://example.com
# Mobile user-agent
curl -A "Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1" -L https://example.com
Many sites will redirect mobile or desktop users to different versions of the site. By setting the user-agent string with -A
, we can simulate different devices.
Handling redirects when submitting a login form:
curl -c cookies.txt -L -d "username=user&password=pass" https://example.com/login
curl -b cookies.txt -L https://example.com/profile
Here we submit a login form and store the session cookies. Then use those cookies to access a profile page that requires authentication.
Following redirects through URL shorteners:
curl -IL https://bit.ly/3kd8dwz
URL shorteners work by redirecting to the full, long URL. By using -I
to make a HEAD request and -L
to follow redirects, we can see the final destination URL.
Redirect Considerations for Web Scraping
Redirects can sometimes complicate web scraping tasks. It‘s important to be aware that websites may use redirects for various reasons:
- Redirecting to a new version of the page
- A/B testing different variations of content
- Anti-bot measures that redirect suspicious traffic
- Paywalls or login walls
To successfully scrape content behind redirects, you‘ll need to ensure your scraping tool (whether that‘s cURL or a headless browser) is configured to properly handle redirects – following them to get the final page content.
Another thing to watch out for is redirect loops or long redirect chains. Some sites may intentionally redirect in a loop to frustrate scrapers. Setting a maximum redirect limit can help avoid getting stuck.
Conclusion
We‘ve covered a lot of ground in this guide to using cURL with redirects. To recap the key points:
- HTTP redirects are a way for servers to tell clients a resource has moved
- cURL can automatically follow redirects with the
-L
or--location
option - You can limit and control cURL‘s redirect behavior with additional options
- Proper cookie handling is often needed when dealing with redirects
- Verbose output and tracing provide helpful debugging details
- Real-world redirect scenarios include handling mobile/desktop redirects, login forms, and URL shorteners
- Redirects are an important consideration when web scraping
I encourage you to try out the examples in this guide and refer to the cURL man pages to learn even more about its capabilites. With the ability to follow redirects and all its other features, cURL is an indispensible tool for anyone working with websites and APIs.