Introduction
If you‘ve ever needed to retrieve files or mirror websites, you‘re probably familiar with wget – the powerful command-line utility for downloading content via HTTP, HTTPS, and FTP. But did you know that wget also supports downloading through proxy servers?
In this ultimate guide, we‘ll dive deep into using wget with a proxy. You‘ll learn what proxies are, why you might need one, and exactly how to configure wget to route its traffic through a proxy server. Whether you‘re a DevOps engineer setting up automated scripts, or just need to download some files behind a corporate firewall, by the end of this guide you‘ll be a master of proxied wget.
But before we jump into configuring proxies, let‘s start with a quick refresher on what wget is and what makes it so darn useful…
Meet wget: The Ultimate File Downloading Swiss-Army Knife
GNU wget (or just "wget" for short) is a free command-line utility for downloading files using HTTP, HTTPS, and FTP protocols. Part of the GNU Project, it was created by Hrvoje Nikšić and first released back in 1996.
So what makes wget so great? Put simply, it‘s fast, powerful, and rock-solid reliable. Some key features include:
- Support for downloading via HTTP, HTTPS, and FTP
- Ability to resume partial downloads
- Recursive downloading to mirror entire websites
- Built-in timestamp checking to only retrieve new/changed files
- Highly configurable via command-line options and config files
Thanks to this awesome feature set, wget has become the go-to tool for many sysadmins, developers, and power users who need a dependable way to download files. It‘s endlessly scriptable, has great documentation, and ships by default on most Unix-like operating systems.
With those wget basics out of the way, let‘s turn our attention to proxies, and why you might need to use one with wget.
Proxy Servers: Your Gateway to the Web
Before we get into configuring wget, let‘s take a moment to discuss proxy servers and how they work.
A proxy server acts as an intermediary between your device and the internet. Instead of requesting a resource like a webpage directly, your request first gets sent to the proxy server, which then forwards it to the destination server. The destination server sends the response back to the proxy, which relays it back to you.
There are a few reasons you might want (or need) to use a proxy server:
- To access websites blocked by your ISP or government
- To bypass firewalls on corporate or school networks
- To mask your real IP address and location for privacy
- To route traffic through a different country to bypass geo-blocking
- To cache content and improve performance on networks with limited bandwidth
If any of those use cases apply to you, then read on to learn how to use wget with a proxy!
Configuring wget to Use a Proxy Server
So how do you actually tell wget to use a proxy server? There are two main ways:
- Setting environment variables
- Specifying settings in the wget config file
Let‘s go through each of those methods in detail.
Setting Environment Variables
The easiest way to make wget use a proxy is by setting a few special environment variables:
- http_proxy / https_proxy – the URL of the proxy server to use for HTTP/HTTPS requests
- ftp_proxy – the URL of the proxy to use for FTP requests (usually the same as http_proxy)
- no_proxy – a comma-separated list of hostnames or domains that shouldn‘t go through the proxy
Here‘s an example of what those variables might look like:
export http_proxy=http://proxy.example.com:3128
export https_proxy=$http_proxy
export ftp_proxy=$http_proxy
Once those are set, wget will automatically route its requests through the specified proxy server. Easy!
Using the wget Config File
For a more permanent solution, you can save your proxy settings in the wget configuration file. wget will look for its config in a couple places:
- /etc/wgetrc – the global config file for all users on the system
- $HOME/.wgetrc – a user-specific config file
The format is the same for both files. To set your proxy settings, just add the following lines:
https_proxy = http://proxy.example.com:3128
http_proxy = http://proxy.example.com:3128
ftp_proxy = http://proxy.example.com:3128
If your proxy requires a username and password, you can include them in the URL like this:
https_proxy = http://username:[email protected]:3128
http_proxy = http://username:[email protected]:3128
ftp_proxy = http://username:[email protected]:3128
Alternatively, you can specify the username and password separately using the proxy_user and proxy_password settings:
proxy_user = username
proxy_password = password
Once your config file is setup, wget will use those proxy settings for all future requests.
Putting it All Together
Okay, let‘s look at a few examples of using wget with a proxy server.
To download a single file via a proxy, use the -e option to set the proxy URL:
wget -e use_proxy=yes -e http_proxy=http://proxy.example.com:3128 http://example.com/file.zip
To mirror an entire website via FTP through a proxy, combine the -m (mirror) and -e options:
wget -m -e use_proxy=yes -e ftp_proxy=http://proxy.example.com:3128 ftp://example.com/pub/
If your proxy requires authentication, you can either include the credentials in the proxy URL:
wget -e use_proxy=yes -e http_proxy=http://username:[email protected]:3128 http://example.com/
Or use the –proxy-user and –proxy-password options:
wget -e use_proxy=yes -e http_proxy=http://proxy.example.com:3128 --proxy-user=username --proxy-password=password http://example.com/
Finally, to exclude certain domains from going through the proxy, you can use the no_proxy setting:
wget -e use_proxy=yes -e http_proxy=http://proxy.example.com:3128 -e no_proxy=localhost,127.0.0.1,.example.com http://example.com/
With those examples under your belt, you should be all set to start using wget with a proxy server! Before we wrap things up, let‘s quickly touch on a few other topics.
Alternatives to wget
While wget is a fantastic tool, it‘s not the only option out there. Here are a couple of other popular command-line downloading utilities:
-
cURL – Supports even more protocols than wget, and also allows sending custom HTTP requests. Can use proxies by setting the http_proxy environment variable or using the -x option.
-
aria2 – A lightweight download utility with support for parallel downloads and metalink files. Specify a proxy using the –all-proxy option.
So if you ever run into a situation that wget can‘t handle, give one of those a try (though wget is usually more than enough).
Performance Tips
When using wget with a proxy, there are a few things you can do to optimize performance:
-
Use HTTP/1.1 instead of 1.0 for reduced latency. wget will try to do this by default, but you can force it with –header="Connection: keep-alive"
-
Enable pipelining to send multiple requests over a single TCP connection. Do this with the –header="Connection: Keep-Alive" and –header="Keep-Alive: 300" options.
-
Increase the number of retries with –tries to help recover from transient network issues. The default is 20.
-
Adjust the –wait option to add a delay between requests. This can avoid overloading the server or getting your IP blocked.
Those tweaks can make a big difference, especially if you‘re mirroring large websites over a slow or unreliable connection.
Wrapping Up
And there you have it – the ultimate guide to using wget with a proxy server! In this tutorial, you learned:
- What wget is, and what makes it so useful
- How proxy servers work, and common reasons for using them
- Two ways to configure wget to use a proxy (environment variables and config file)
- How to include proxy credentials for authentication
- Several examples of downloading files through a proxy with wget
- A few wget alternatives like cURL and aria2
- Tips for optimizing performance when using proxied wget
I hope this guide has been helpful! While proxies can seem complex at first, tools like wget make it relatively painless to work with them. So now that you know how, feel free to use proxied wget in your shell scripts, scheduled jobs, and data pipelines to your heart‘s content.
As always, if this guide was useful, please consider sharing it with others. And if you have any other tips for using wget with proxies, do let me know. Happy downloading!
Additional Resources
Want to learn even more about wget and proxies? Here are a few resources worth checking out: