Scraping Our Way to $1 Million ARR: The ScrapingBee Story

"Overnight success" is a myth in the software world. The real story tends to be far less sexy – years of toil to find product-market fit, a slow ramp of revenue growth, and plenty of gut-wrenching pivots along the way.

This is that kind of story. It‘s the tale of how two software engineers from a small town in France stumbled their way into building ScrapingBee, an API that helps developers easily extract web data at scale. It took us over 5 years, 2 failed products, and countless iterations to reach $1 million in annual recurring revenue (ARR).

We don‘t have a flashy TechCrunch feature or millions in VC funding to flaunt. But we do have real revenue from real customers and the freedom to chart our own course as a 100% bootstrapped company. Here‘s how we did it.

The Itch for a Better Web Scraping Tool

Our founding team of Kevin and Pierre first met in high school, where we bonded over a shared love for coding. After graduating university with computer science degrees, we both took corporate software jobs but quickly grew restless. The siren song of entrepreneurship was calling.

In our spare time, we started building basic web scrapers to collect data for side projects. This exposed us to the core tools and libraries available for crawling websites – things like Cheerio, Requests, and Puppeteer. While powerful, they were a pain to work with. Development speed was slow and it required a lot of trial-and-error to extract data from dynamic sites at any meaningful scale.

As we dug deeper, we realized these challenges were common across nearly all web scraping tools at the time:

Steep learning curve, even for experienced devs
Lack of pre-built browser infrastructure (i.e. having to spin up your own Headless Chrome instances)
Non-existent or lackluster documentation
Inflexible pricing that made costs unpredictable as you scaled

In short, web scraping was still too hard. A huge lightbulb went off – what if we could build an API to make collecting web data as easy as making a few HTTP requests? No complex configs, no servers to manage, just send a URL and get back structured data. Surely other developers would find this valuable!

Failing Our Way to a Good Idea

Filled with inspiration for a "Stripe for web data", we quit our jobs in 2016 to work on startup ideas full-time. First up was a price tracking extension for consumers called ShopToList. Surely online shoppers would love an easy way to get notified about price drops?

We launched on ProductHunt to a warm reception, racking up over 1,200 upvotes and 20,000 installs in the first month. But the excitement quickly faded as we struggled to keep users engaged and monetize the product. Key stats from ShopToList‘s short life:

35,000 total installs
1.5% conversion rate from free to paid
$1,500 in total revenue
< 1% retention after 3 months
6 months from launch to sale

With ShopToList failing to find a foothold, we sold the IP to an e-commerce agency for a small sum and quickly moved on to our next idea – a price tracking tool for online retailers called PriceBot. We figured companies would be easier to monetize than consumers.

We spent 3 months building an MVP in stealth mode and started reaching out to potential customers for feedback. The responses were crickets at first, but we kept delivering demos, talking to users, and making improvements to the product. Lo and behold, after about 50 conversations, we finally landed our first paying customer for $50/month!

A glimmer of success, but growth flatlined from there. We knew from our customer development that price tracking was a real need, but we weren‘t well equipped to serve the specific needs of retailers. The e-commerce data we provided wasn‘t complete or reliable enough, and we didn‘t have the domain expertise to solve these issues quickly. After a year of struggling to find product-market fit, we sunset PriceBot.

A Pricing Breakdown of PriceBot:

💰 Average contract value: $75/mo
👥 Total paying customers: 8
💸 Peak MRR: $600
📈 MRR growth rate: 10% month-over-month
⏱️ Lifetime: 14 months

The silver lining was that through building PriceBot, we developed a deep competency in web scraping tech. Maybe the real opportunity was to package up this expertise into a more generic web data API, one that we were uniquely qualified to build.

Finding Our Footing With ScrapingBee

In 2019, we began work on a new API product to make collecting web data dead simple for developers. We called it ScrapingBee. Laser focused on providing a great developer experience, we poured ourselves into building key features like:

🥞 Easy to use SDKs in popular languages like Python and JavaScript
📚 Extensive documentation with code snippets
🛠️ A request builder to easily construct API calls
📦 One-click integrations with data destinations like S3 and webhooks

We worked quickly to get an MVP out to market and opened up a private beta to get feedback from real developers. The initial response was promising – people loved how easy it was to extract data from web pages with just a few lines of code.

This early validation gave us the conviction to start charging from day 1. We used a usage-based pricing model starting at $49/mo for 100k API credits. Our first customer signed up within an hour of launching paid plans – a $99/mo plan to scrape weekly retail pricing data.

From there, revenue grew slowly but steadily as we graduated new customers off of our free trial. We talked to every user to deeply understand their needs and use cases – a practice we maintain to this day. Some illuminating insights that came out of these customer interviews:

Web scraping was business-critical but not a core competency for most of our customers. They needed a solution that "just worked" so they could focus on their actual product.
Because many customers relied on web data to drive core functionality like price comparisons, they needed guaranteed uptime and high concurrency to handle burst traffic.
E-commerce and SaaS companies made up a large chunk of our user base and had very specific data extraction needs around gleaning product details, reviews, and pricing.

Based on this feedback, we made significant investments to improve ScrapingBee‘s reliability and performance:

🤖 Transitioned to a containerized architecture to efficiently isolate scraping jobs
🌐 Expanded to multiple data centers across 3 continents to reduce latency
🔄 Implemented advanced retry logic and error handling to guarantee successful data extraction
⚡Optimized networking and compute to achieve response times of <5 seconds on 95% of requests
📈 Stress tested our API to handle 500+ concurrent connections per user

The Power of Content Marketing

On the marketing front, educating the developer community about web scraping proved to be our secret weapon. By leveraging Kevin‘s experience writing about web scraping, we began publishing deeply technical tutorials on ScrapingBee‘s blog to help people solve common scraping challenges.

This "engineering as marketing" approach struck a chord. Our posts regularly hit the front page of Hacker News and got shared widely on social media. The blog quickly became a destination for web scraping expertise. Some of our top performing content:

📕 How to Scrape Websites Without Getting Blocked: 70,000+ pageviews
🐍 Web Scraping with Python: The Beginner‘s Guide: 50,000+ pageviews
☕ Web Scraping with Java: The Complete Guide: 30,000+ downloads

This traffic generated a flywheel of new signups, many of whom converted to paid plans and also referred colleagues. In the first 6 months after launching our content program, blog traffic grew 20% month-over-month and accounted for 50% of all new user registrations.

By doubling down on content, our site reached 100,000 monthly visitors less than a year after launching. More importantly, tying content to product usage revealed that content-driven users were amongst our best customers – they had 30% higher retention and 25% higher lifetime value compared to other acquisition channels. Content quite literally became our growth engine.

The Road to $1 Million ARR

With strong organic acquisition in place, we kept shipping product improvements to boost retention and reduce churn. Some of the key releases:

🎨 Headless Chrome rendering to parse JS-heavy pages
📊 AI-powered data extraction to pull structured data from any page
🔒 Dedicated proxy pools to route sensitive scraping jobs
🌎 Geotargeting to get localized data from any country

As ScrapingBee matured, revenue growth accelerated. It took us 18 months from launch to reach $10k MRR, but only 3 more months to double that to $20k MRR. By mid-2021, growth was picking up serious steam:

📈 Monthly revenue growth rate: 15%
💸 Average revenue per user: $230/mo
📅 Average customer lifetime: 11 months
🔁 Gross revenue retention: 95%+
🏆 Net Promoter Score: 65

To keep up with the increased scale, we brought on our first full-time engineer Etienne and moved to a usage-based pricing model that would support our heaviest users. The addition of Etienne was a game-changer, allowing us to ship product and scale infrastructure faster than ever before.

Finally, in November 2021 – more than 5 years after embarking on our entrepreneurial journey – ScrapingBee crossed $1 million in ARR. The breakdown of that revenue reflects the diversified and global customer base we‘ve built:

🌍 110+ countries represented
🪙 600+ paying customers
🦄 20% of revenue coming from enterprise contracts
🛒 E-commerce data powering 30% of revenue
📊 SaaS companies making up 20% of revenue
🕵️‍♀️ Market research accounting for 15% of revenue

Achieving the $1 million milestone was gratifying, but we know it‘s just the beginning. Every year, thousands of new companies are started that could benefit from web data. Our mission is to help them extract that data as quickly and painlessly as possible so they can focus on building great products.

Lessons Learned from Bootstrapping to $1 Million

Our path to $1 million ARR was a winding one. We navigated multiple product failures, scraped by on savings to keep the lights on, and fought tooth-and-nail for every dollar of revenue. It was the hardest thing we‘ve ever done – but also the most rewarding.

For those considering bootstrapping a SaaS business, a few key pieces of advice:

🔍 Validate your idea by talking to potential customers before writing a line of code. No amount of planning can replace real market feedback.
🧪 Embrace an experimental mindset. Most of your assumptions will be proven wrong – the key is to iterate quickly based on data.
🗓️ Be patient. Bootstrapping is a long game. Don‘t expect exponential growth curves out of the gate.
🎯 Nail your positioning. Trying to be everything to everyone is a recipe for mediocrity. Figure out what you‘re uniquely good at and double down on that.
💌 Treat your customers like royalty. They are the lifeblood of your business. Always be responsive to their needs.
⚖️ Prioritize sustainability over growth-at-all-costs. The beauty of bootstrapping is you can build a company on your own terms.

The journey is far from over for ScrapingBee. We‘re still a small, tight-knit team with ambitious goals for the future of web data extraction. But we‘re proud to have built a healthy business our own way, one scraped page at a time.

If there‘s one thing we‘ve learned, it‘s that with enough grit and a strong product, you don‘t need fancy VC dollars to build a great software company. The bootstrapper‘s path may be less glamorous, but it‘s open to anyone crazy enough to give it a shot.

The Itch for a Better Web Scraping Tool

Failing Our Way to a Good Idea

A Pricing Breakdown of PriceBot:

Finding Our Footing With ScrapingBee

The Power of Content Marketing

The Road to $1 Million ARR

Lessons Learned from Bootstrapping to $1 Million

Join the conversation Cancel reply

Related Posts

How to Use XPath Selectors for Web Scraping in Python

How to Select Elements by Text in XPath

How to Select Elements by Class in XPath: The Ultimate Guide