Hey friend! Are you looking to get started with price scraping? As an experienced web scraping pro, I‘m excited to share my insider knowledge to help you succeed. One of the trickiest parts of any scraping project is getting the user agents configured correctly.
I know user agents can seem confusing at first – you‘re probably wondering, what even is a user agent? Let me explain what they are and why they‘re so important for price scraping.
What is a User Agent?
Whenever your browser sends a request to a website, it includes a short piece of text called the user agent. This identifies details about the browser and operating system you‘re using. Here‘s an example user agent string from Chrome on Windows 10:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36
As you can see, it contains the browser name (Chrome), version number (74.0.3729.169), platform info (Windows NT 10.0), and layout engine (WebKit/Blink).
User agents provide all this info to help websites adapt their content for different browsers and devices. For example, sites might serve lighter, mobile-optimized pages to phones versus full desktop sites on laptops.
A Brief History of User Agents
Browsers have included user agent strings right from the early days of the web. The first browser created at the National Center for Supercomputing Applications (NCSA) called Mosaic had a simple user agent:
NCSA_Mosaic/2.0 (Windows 3.1)
When Netscape Navigator arrived in 1994, the browser wars began. Companies battled to make the most standards-compatible browser with the best features. More detailed user agent strings helped websites detect different browsers.
Internet Explorer and Firefox continued the war through the 2000s. Their user agents highlighted proprietary technologies to try to get sites to optimize for them. Today Chrome dominates, but the user agent lives on.
Changing User Agents for Scraping
So how do user agents fit into web scraping? Sites often block scrapers and bots based on suspicious user agents. A common scraping tool might have a user agent like:
ScraperBot/3.0
This is easy for sites to identify and block. That‘s why we need to spoof real browser user agents when scraping!
Browser extensions like User-Agent Switcher make this easy to test different user agents. Proxy tools like Oxylabs also let you configure residential proxies with mobile, desktop, and customized user agents.
Rotating between the most common real browser user agents is key for any successful scraper. Let‘s talk about which user agents you‘re likely to see.
Most Common Desktop User Agents
The desktop browser landscape today is dominated by browsers using the Chromium engine (Chrome, Edge, Brave, Opera, etc). Here are some of their latest user agents:
Chrome:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36
Edge:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.54
Opera:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 OPR/94.0.0.0
Firefox is the main alternative browser engine still seeing significant use:
Firefox:
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0
Browser | Market Share | Engine |
---|---|---|
Chrome | 65.4% | Blink (Chromium) |
Safari | 18.7% | WebKit |
Firefox | 7.2% | Gecko |
Edge | 4.2% | Blink (Chromium) |
As you can see, Chromium engines dominate desktop browsing today!
Most Common Mobile User Agents
Mobile browsing is dominated by Apple‘s iOS and Google‘s Android platforms. Here are examples of their user agents:
iOS:
Mozilla/5.0 (iPhone; CPU iPhone OS 15_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.4 Mobile/15E148 Safari/604.1
Android:
Mozilla/5.0 (Linux; Android 10) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36
Android has a more fragmented market share split across its version releases:
Version | Codename | Market Share |
---|---|---|
12 | Snow Cone | 26.5% |
11 | Red Velvet Cake | 24.2% |
10 | Quince Tart | 22.9% |
This gives you an idea of the main mobile user agents to mimic for scraping.
Why User Agents Matter for Price Scraping
Price scraping often requires heavy traffic which makes blocks more likely. E-commerce sites aggressively try to detect scrapers grabbing price data from competitors.
Using authentic, constantly changing user agents is crucial to avoid blocks when price scraping. It helps your scrapers masquerade as real browser activity vs bots.
Another cool tip – some sites may serve mobile user agents special discounted pricing only available on mobile apps! So leveraging mobile user agents could get you better pricing data.
Continually Test New User Agents
Sites are always updating their bot detection rules, so we have to continually test new user agents in our scrapers. I like to start with small test volumes to see if a new agent gets blocked before ramping up. This prevents wasting traffic.
Proxy tools like Oxylabs make testing easier by providing thousands of residential IPs with associated user agents. I can simply select a new random sample for each scrape.
The best practice is to always have a diverse rotation of updated, authentic user agents ready to use. This cat and mouse game is just part of the job for us web scrapers!
I hope these user agent tips help you in your price scraping adventures. Let me know if you have any other questions!