Skip to content

Use a Custom Scraper to Unlock Dun & Bradstreet‘s Data Goldmine

Dun & Bradstreet (D&B) maintains the world‘s largest commercial database, with over 330 million business records globally. Unfortunately, tapping into this data goldmine through D&B‘s official channels can be prohibitively expensive for many companies.

That‘s where building your own custom web scraper comes in. With the right approach, you can extract huge amounts of market intelligence from D&B to power your business decisions, sans the huge price tag.

As an experienced web scraping specialist who frequently utilizes proxies to extract data at scale, I‘ve helped multiple enterprises create scrapers to unlock D&B‘s data treasure trove.

In this comprehensive guide, I‘ll share my insider knowledge to help you leverage D&B web scraping successfully, including:

The High Costs of D&B‘s Closed Data Ecosystem

First, let‘s examine why D&B‘s data is so valuable, yet so costly through official means.

D&B‘s crown jewel offering is its D&B Hoovers database. This contains in-depth profiles on over 330 million public and private companies globally, including:

  • Firm financials
  • Corporate family connections
  • Executives and principals
  • News, events, and lawsuits
  • Hierarchical industry codes
  • Competitive landscape insights
  • Risk scores and ratings

This data can provide incredible market intelligence for sales prospecting, KYC, due diligence, and other applications.

However, D&B employs a closed, walled garden approach, forcing customers to pay exorbitant rates for database access.

Some example D&B costs:

  • D&B Hoover‘s API – $60,000/year minimum for 5 million records

  • D&B Credit Reports – $199-$600+ per individual report

  • D&B Company Update Report – $169 for a single report

At these prices, tapping into D&B‘s data is out of reach for many companies. This is where building your own custom scraper comes in.

Benefits of Creating Your Own D&B Web Scraper

While D&B actively discourages scraping, creating your own custom D&B scraper can provide tremendous advantages, such as:

1. Cost Savings

Web scraping tools and cloud servers to run scrapers cost just pennies per extract compared to D&B‘s pricey data access.

2. Data Control

You own the scraped data to use as you see fit, no restrictive licenses or proprietary APIs.

3. Customization

Extract only the precise data points you need from D&B‘s vast database.

4. Scalability

Scale data collection through parallel scraping to meet your business needs.

5. Evasion of Restrictions

Access D&B data without limits, throttling or sampling enforced on official APIs.

Let‘s now dive into how you can build your own D&B scraper to realize these benefits.

A Primer on Dun & Bradstreet‘s Website Structure

To create an effective web scraper, you need to understand how D&B structures its online data assets.

At a high level, D&B splits its data into three primary sections:

1. Company Profiles

In-depth profiles on 330M+ companies worldwide, with descriptors like:

  • Key contacts
  • Firmographics
  • Corporate family tree
  • Financials
  • Competitors
  • Certifications
  • Bankruptcies, liens, lawsuits
  • Hierarchical industry codes

2. Business Directory

Listings of 330M+ companies searchable by keywords, location, industry, size, and other filters. Contains basic data like:

  • Contact info
  • Industry codes
  • Employee size
  • Estimated revenue

3. News & Research

Proprietary market research reports, risk insights, and business news coverage.

Understanding this structure helps inform where to target your scraper. Next, let‘s explore proven scraper architectures.

Choose Your Weapon: Evaluating D&B Scraper Approaches

When creating a web scraper, you first need to choose a platform. Here are the most common options for D&B scraping, with pros and cons:

Browser Automation Scrapers

Platforms like Puppeteer, Playwright, and Selenium drive real browsers to click buttons and fill forms programmatically.

Pros:

  • Can closely mimic human actions to appear non-botlike

  • Built-in tools like headless browsers, proxies, and stealth settings

Cons:

  • Resource intensive, doesn‘t scale well

  • Prone to CAPTCHAs and blocks without careful tuning

HTTP Request Scrapers

Tools like Scrapy and web-scraper.js make direct HTTP requests to fetch and parse data.

Pros:

  • Lightweight, great for large scale scraping

  • Fast extraction speeds

Cons:

  • Can be easier to detect as bot activity without precautions

  • No built-in browser or proxy capabilities

Managed Scraping Services

Platforms like ScrapingBee, ScraperAPI, and Octoparse provide scraper hosting, proxies, and CAPTCHA solving.

Pros:

  • Quickly get started scraping without coding

  • Handle proxies, browsers, and CAPTCHAs for you

Cons:

  • Less customization options

  • Ongoing subscription fees at scale

So which approach is best for D&B? Here are my recommendations…

For most scrapers, browser automation balances robustness and scale. For maximum control, Scrapy and Puppeteer are great choices.

Now let‘s explore must-have features for an effective D&B scraper.

Critical Capabilities for an Optimized D&B Scraper

Based on my experience, here are some key features any custom D&B scraper should provide:

Company Profile Extractor

The crown jewels are D&B‘s in-depth company profiles. Configure searches by criteria like company name, location, and industry to extract full profiles.

Business Directory Crawler

Extract abbreviated listings from D&B‘s directory of 330M+ companies globally. Useful for LeadGen.

Search by Keyword

Flexibly search for companies by keyword and extract matching profiles or listings.

Pagination Handling

Auto-detect and follow "next page" links to crawl full result sets across pages.

Proxy Rotation

Rotate proxy IP addresses to distribute requests and avoid blocks.

Export Options

Customizable output formats like JSON, XML, CSV, etc. to integrate scraped data with other systems.

I‘ll next provide code snippets and examples for key capabilities using Puppeteer, one of my favorite D&B scraping tools.

Extracting Company Profiles

Here is sample Puppeteer code to search D&B by company name and extract full profiles:

// Search for company
await page.type(‘#searchbox‘, ‘Walmart‘);
await page.click(‘#search-button‘);

// Extract name, description, etc. from result
const name = await page.$eval(‘.company-title‘, el => el.innerText); 

// Navigate to full profile
const url = await page.$eval(‘.company-title a‘, el => el.href);
await page.goto(url);

// On profile page, extract further data 
const description = await page.$eval(‘.company-description‘, el => el.innerText);
const financials = await page.$$eval(‘table tr‘, rows => {
  return rows.map(row => {
    const cells = row.querySelectorAll(‘td‘);
    return {
      metric: cells[0].innerText, 
      value: cells[1].innerText
    }
  });
});

This allows scraping in-depth data from D&B‘s profiles.

Crawling the Business Directory

Here is sample code to extract abbreviated listings from D&B‘s directory search:

// Search for "software companies in Texas" 
await page.type(‘#searchinput‘, ‘software companies in texas‘);
await page.click(‘#search-button‘);

// Extract data from each result  
const results = await page.$$eval(‘.search-results li‘, listings => {
  return listings.map(listing => {
    return {
      name: listing.querySelector(‘.company-name‘).innerText,
      url: listing.querySelector(‘.company-name a‘).href,
      location: listing.querySelector(‘.location‘).innerText,
      // etc...
    }
  });
});

// Follow pagination
while (hasNextPage(page)) {
  const nextLink = await page.$eval(‘.pagination .next-page‘, el => el.href);
  await page.goto(nextLink);

  // Extract next page results
  const moreResults = //...
}

This iterates through directory search results across pages to extract business listings.

Handling CAPTCHAs and Blocks

Here are some techniques I use to avoid scrapes getting blocked:

  • Rotate proxies – Use libraries like proxy-chain to automatically rotate IP addresses.

  • Throttle requests – Insert delays between page loads to mimic human browsing patterns:

// Scrape page
await scrapePage(page); 

// Wait 5-10 seconds  
await page.waitForTimeout(5000 + Math.random() * 5000);
  • Solve CAPTCHAs – Integrate services like AntiCaptcha to solve CAPTCHAs when encountered.

With these precautions, you can scrape responsibly while minimizing disruptions.

It‘s also important to keep these legal guidelines in mind when web scraping:

  • Comply with sites‘ Terms of Service and any cease and desist requests.
  • Don‘t overload sites with too many requests per second.
  • Only scrape data you plan to use, not entire sites.
  • Don‘t redistribute scraped data – solely for internal use.

This ensures your efforts stay above board.

Alternative Data Sources

In closing, here are a few additional sources beyond D&B to enrich your business intel data:

  • Data brokers – Providers like Acxiom sell proprietary business data sets.
  • Enriched databases – Tools like Clearbit append firmographic attributes to business contacts.
  • Data marketplaces – Exchanges like Snowflake Data Marketplace offer third-party data.
  • Business registries – State registration databases contain useful public business info.

Combining D&B scraping with these other sources can really amplify your market intelligence capabilities.

Conclusion

Scraping Dun & Bradstreet using a tailored web scraper unlocks access to their unmatched global business database at a fraction of the official costs. With the right approach and precautions, you can leverage D&B data to take your competitive intelligence and prospecting to the next level.

In this guide, I‘ve shared actionable insights from my experience using proxies and scrapers to extract huge value from D&B cost-effectively and legally. I hope these tips empower you to tap into this data goldmine to enhance your business decisions and strategy.

Let me know if you have any other questions! I‘m always happy to chat more about advanced web scraping techniques.

Join the conversation

Your email address will not be published. Required fields are marked *