How to Scrape Free Public Proxy Lists and Find Working Proxies

Proxies are an essential tool for web scraping and automation. They allow you to mask your real IP address and appear to be accessing the web from different locations. This helps you avoid getting blocked while scraping or sending too many requests from one IP address.

While paid proxy services like BrightData and Smartproxy offer reliable, high-quality proxies, they can get expensive for large projects. That‘s where free public proxies come in handy. There are tons of free proxy lists online you can leverage at no cost.

The challenge is that most public proxies don‘t actually work or are too slow to use. You have to sift through a ton of dead proxies to find ones that work.

Luckily, there are techniques you can use to automate the process of extracting free public proxies and verifying which ones work.

In this guide, I‘ll cover:

The best free proxy list sites to scrape
How to extract proxies from these sites
Tools to test and verify working proxies
Tips to integrate free proxies into your web scraping projects

After reading, you‘ll be able to quickly build lists of hundreds of working free proxies from public sources.

The Best Free Proxy List Sites

There are hundreds of sites that provide free proxy lists you can scrape. Here are some of the best ones I‘ve found:

1. Geonode

Geonode likely has the largest database of free proxies online. They provide a JSON API along with a website you can scrape.

The JSON API (http://geonode.com/free-proxy-list/) returns proxies in this format:

{
   "ip":"111.119.187.178",
   "port":6000,
   "code":"KP",
   "country":"North Korea",
   "anonymity":"High +KA"
}

You can filter proxies by type (HTTP, SOCKS4/5), anonymity level (transparent, anonymous, elite), and country.

The website (https://geonode.com/free-proxy-list/) lists proxies in a HTML table, which you can parse with a web scraper.

Overall, Geonode tends to have the largest selection of free proxies, with some reaching speeds over 1,000 ms. It‘s updated frequently, so it‘s worth checking daily.

2. Free-Proxy-List.net

This site (https://free-proxy-list.net/) provides a clean HTML table of HTTP, HTTPS, and SOCKS proxies.

It‘s formatted nicely for web scraping:

<tr><td>111.119.187.178</td><td>6000</td>...</tr>

The site checks proxies every 10 minutes, so they tend to be more reliable. There are also helpful attributes like response time and when the proxy was last checked.

Free-Proxy-List.net has a smaller selection than Geonode, but the proxies are generally faster with more uptime. It‘s a great source for highly anonymous elite proxies.

3. OpenProxyList

OpenProxyList (https://openproxylist.xyz/) takes a different approach and lists fully unfiltered proxies. This means there are a lot more dead or unreliable proxies, but you can also find some hidden gems.

The data is presented as a JSON array:

[
  {
    "ip": "111.119.187.178 ",
    "port": "6000",
    "code": "KP",
    "country": "North Korea",
    "anonymity": "High +KA",
    "google": "Transparent",
    "https": "Transparent",
    "last_checked": "1 minute ago"
  },

While lower quality overall, OpenProxyList updates constantly (as often as every 5 minutes). The high update frequency helps uncover new working proxies faster.

4. ProxyScrape

ProxyScrape (https://api.proxyscrape.com/?request=displayproxies&proxytype=http) has a fast JSON API with bulk HTTP, HTTPS, and SOCKS proxies.

You can get new proxies on demand by specifying the number of results and a custom port range. The proxies aren‘t vetted for quality or speed though.

The API limits you to 1,000 proxies per call for the free plan. But the convenience of generating fresh proxies via API makes ProxyScrape worthwhile.

5. PubProxy

PubProxy (https://pubproxy.com/) takes a crowdsourcing approach to building its free proxy list.

Anyone can submit their own proxies, which are then voted up or down by other users. Higher voted proxies tend to be more reliable.

PubProxy lists the proxies in a clean HTML table:

<tr>
<td>111.119.187.178</td> 
<td>6000</td>
</tr>

In addition to IP and port, other metadata like country, anonymity level, and speed are included.

Since anyone can submit proxies, there are a lot of dead ones. But the voting feature surfaces working, high-quality proxies quickly.

6. Proxy-List.download

This site (https://proxy-list.download/api/v1/get?type=http) has a JSON API that serves up bulk proxies. You can specify HTTP, HTTPS, or SOCKS as well as filter by anonymity level and connection speed.

The /api/v1/get endpoint returns proxies in this structure:

{
   "IP": "111.119.187.178",
   "Port": "6000",
   "Code": "KP",
   "Country": "North Korea",
   "Anonymity": "High +KA",
   "Google": "Transparent",
   "HTTPS": "Transparent",
   "Last_Checked": "1 minute ago"
}

The API requires an API key, but a free plan with 1,000 lookups per month is available. The API makes it easy to integrate Proxy-List.download into an automated workflow.

7. Spys.one

Spys.one (http://spys.one/) publishes a giant JSON list with over 17,000 public proxies. All the proxies are stored in one massive JSON file.

Here‘s an example proxy:

{
  "ip": "111.119.187.178",
  "port": 6000,
  "protocols": ["http"],
  "country": "North Korea",
  "anonymity": "High +KA",
  "google": "Transparent",
  "https": "Transparent",
  "last_checked": "1 minute ago"
},

The sheer size of the list means it‘s a great source for bulk proxies. There are a lot of dead ones, but with over 17k proxies, there are bound to be some working ones too.

This list is best when you need a high volume of proxies and plan to aggressively filter out the dead ones.

Extracting Proxies from Sites

Now that you know where to find public proxies, let‘s look at techniques for extracting them.

For HTML lists, you‘ll want to use web scraping. For JSON APIs, you can use regular HTTP requests.

Web Scraping HTML Proxy Lists

To scrape HTML proxy lists, you‘ll want to inspect the page and identify patterns in the markup.

For example, Free-Proxy-List.net uses this structure:

<tr><td>111.119.187.178</td><td>6000</td>...</tr>

So you could extract the IP and port for each row:

from bs4 import BeautifulSoup
import requests

url = ‘https://free-proxy-list.net/‘
response = requests.get(url)
soup = BeautifulSoup(response.text, ‘html.parser‘)

proxies = []
for row in soup.select(‘tr‘):
  ip = row.select(‘td‘)[0].text
  port = row.select(‘td‘)[1].text

  proxy = f"{ip}:{port}"
  proxies.append(proxy)

print(proxies)

This would print out a list like:

["111.119.187.178:6000", "222.186.170.22:9999", ...]

For sites like Geonode and PubProxy that have more complex tables, you may want to use a tool like Puppeteer, Playwright, or Selenium to scrape the proxies. These tools allow you to scrape dynamically generated content from JavaScript-heavy sites.

Fetching JSON Proxy APIs

For JSON APIs, you can simply make a HTTP request and parse the response:

import requests 

url = ‘https://api.proxyscrape.com/?request=displayproxies&proxytype=http‘
response = requests.get(url)
data = response.json()

proxies = []
for proxy in data:
   ip = proxy[0]
   port = proxy[1]
   proxies.append(f"{ip}:{port}")

print(proxies)

This sends a request to the ProxyScrape API and parses the list of IP:PORT pairs from the JSON response.

Most public proxy APIs require no authentication, but some like Proxy-List.download require an API key.

Overall, JSON APIs provide a more direct way to get bulk proxies compared to scraping HTML.

Validating Proxies with Proxy Checkers

Simply extracting proxies is only half the battle. Most public proxies don‘t actually work.

The next step is separating the working proxies from the dead ones. To do this, you need to test the extracted proxies.

There are a couple purpose-built tools that make proxy testing easy:

Proxy Checker – Browser extension to test proxies from a list against a target URL
ProxyTester – Tool to check proxy list against multiple URLs
ProxyJudge – Validates proxies using multi-step testing process

These tools take a list of extracted proxies and validate them by sending test requests through each one. They remove any proxies that fail or timeout.

Most of the tools have free plans that let you test up to 100 proxies per request.

Here‘s how you‘d filter a list of proxies with Proxy Checker:

from proxy_checker import ProxyChecker

target_url = ‘https://httpbin.org/ip‘

# Extract proxies from sites
proxies = ["111.119.187.178:6000", ...] 

checker = ProxyChecker()
working_proxies = checker.check_proxies(proxies, target_url)

print(working_proxies)

This would return only the proxies that successfully routed requests to the target URL.

You now have a filtered list of active, working proxies! 🎉

Integrating Proxies into Web Scraping Projects

Armed with lists of free working proxies, let‘s look at how to put them to work in web scraping and automation projects.

Option 1: Proxy Rotation

A common proxy technique is rotating through the list to mask your requests. This prevents you from getting IP banned since each request uses a different proxy IP.

Here‘s example Python logic to implement proxy rotation:

import requests

working_proxies = ["111.119.187.178:6000", "222.186.170.22:9999"...]

# Rotate through proxies 
next_proxy_index = 0

for page in range(1, 100):

  proxy = working_proxies[next_proxy_index]

  print(f"Request {page} via {proxy}")

  response = requests.get(‘https://www.site.com/page-‘ + str(page), proxies={"http": proxy})

  # Go to next proxy
  next_proxy_index += 1
  if next_proxy_index >= len(working_proxies):
    next_proxy_index = 0

This loops through the working proxies from the list to send each request from a different IP.

Option 2: Proxy Manager

Manually rotating proxies can get tricky. More robust tools like Proxy Manager handle proxy rotation for you.

With Proxy Manager, you configure a pool of working proxies. It automatically distributes requests across this pool with automatic failover when proxies go down.

Some features:

Load balances requests across proxies
Removes failed proxies and adds new ones
Retries failed requests with new proxies
Integrates with Python, JavaScript, Postman, etc

This takes care of proxy management so you can focus on the rest of your scraper.

Option 3: Residential Proxies

Residential proxies are proxies from real desktop and mobile devices. They provide the highest level of anonymity since they use real residential IPs.

Services like Luminati and Oxylabs provide access to millions of residential proxies for premium monthly fees.

The residential proxies fully mimic real human browsing behavior, rotating IP addresses after each request if needed. Using residential proxies is overkill for many use cases, but provides the highest guarantee against getting blocked while scraping.

Conclusion

Scraping free public proxy lists is a great way to unlock thousands of free proxies for your web scraping and automation projects.

The key steps are:

Extract proxies from high-quality public proxy sites
Validate and filter working proxies using a proxy checker
Integrate proxies into your scraper via rotation or a proxy manager

With freely available proxies, you can scrape and automate at scale without worrying about blocks or rate limits.

What proxy sources and techniques have you found most useful? I‘d love to hear in the comments!