Skip to content

The Complete Guide to Scraping Data from Mobile App APIs

In our modern mobile-first world, mobile apps are exploding in popularity and replacing traditional websites for many services. There are now over 5 million apps available across major app stores. The average smartphone user has over 80 apps installed. And mobile apps account for 90% of time spent online in apps vs the mobile web.

What does this mean? Mobile apps are becoming one of the most valuable and abundant sources of data in our increasingly digital world. Location data, usage stats, profiles, transactions, and more can be extracted from mobile app traffic.

This presents a huge opportunity for businesses, researchers, and developers. But harvesting this data requires intercepting the communication between apps and their backend APIs. That‘s because mobile apps use encrypted HTTPS connections to transmit data securely.

Simply sniffing the network traffic won‘t reveal the content. You need a way to decrypt the traffic.

Enter man-in-the-middle (MITM) proxy servers.

In this comprehensive, 4,000 word guide, you‘ll learn:

  • What exactly is a MITM proxy and how it works
  • Step-by-step how to set up your own MITM proxy for mobile app traffic analysis
  • How to use the proxy to observe and reverse engineer a mobile app API
  • Tools and techniques for scraping valuable data from mobile app APIs
  • MITM proxy best practices for responsible data collection

If you want hands-on experience extracting data from apps like Tinder, Airbnb, Yelp, and more, you’ve come to the right place. Let’s get started!

What is a Man-in-the-Middle (MITM) Proxy?

To understand how MITM proxies allow scraping mobile app data, you first need to understand what HTTPS encryption does.

HTTPS uses SSL/TLS certificates to encrypt communication between a client (e.g. mobile app) and server (e.g. API backend). This prevents anyone spying on the network from viewing or tampering with the traffic.

But what if you could intercept that traffic by situating yourself in the middle—between the client and destination server?

This is exactly what a MITM proxy does. The proxy acts as an intermediary that all traffic passes through:

Client <—-> MITM Proxy <—-> Destination Server

The proxy establishes separate SSL connections with the client and server to decrypt their traffic. It can then inspect, analyze, and even modify the plaintext requests before re-encrypting and sending them along.

By installing the proxy‘s root certificate as a trusted authority, it can effectively impersonate any domain. This allows seamless interception without triggering browser security warnings.

Some popular MITM proxies include:

Proxy Platform Notes
mitmproxy Mac, Windows, Linux Powerful console-based tool
Charles Proxy Mac, Windows, Linux GUI, device configuration support
Fiddler Windows Can debug traffic from Windows apps

In this guide, we‘ll demonstrate using mitmproxy since it‘s free, open source, and fast to set up.

Now let‘s look at how to configure a mobile device to route its traffic through your MITM proxy.

Step 1: Install mitmproxy on Your Computer

The first step is installing and starting the mitmproxy proxy server on your desktop or laptop computer. It will intercept requests from devices configured to route through it.

Install on macOS

If you‘re on a Mac, the easiest way to install mitmproxy is via Homebrew:

$ brew install mitmproxy

Install on Linux

On Linux, use your distro‘s package manager, e.g.:

$ sudo apt install mitmproxy # Debian/Ubuntu
$ sudo dnf install mitmproxy # Fedora

Install on Windows

Windows users can download the official binary release from mitmproxy.org. Be sure to pick the latest mitmproxy version, not the mitmdump utility.

Start the Proxy

Once installed, start mitmproxy on the default port 8080:

$ mitmproxy

You should see the intercepted requests appearing in the mitmproxy console:

mitmproxy console

Leave this running in the background as you configure your mobile device to use the proxy.

Step 2: Configure Mobile Device to Route Through mitmproxy

Now we need to configure the mobile device to route its traffic through the mitmproxy proxy for interception.

Here are the steps for common mobile operating systems:

On iPhone/iOS

  1. Connect your iPhone to the same Wi-Fi network as your computer running mitmproxy.

  2. Go to Settings > Wi-Fi and select your current network.

  3. Scroll down and tap Configure Proxy.

  4. Select Manual to set your own proxy details:

    • Server: The IP address of your computer on the network (e.g. 192.168.1.10)

    • Port: 8080

This routes all device traffic through your computer and mitmproxy!

On Android

  1. Connect your Android device to the same Wi-Fi network as the proxy.

  2. Go to Settings > Wi-Fi > Advanced options > Proxy and select Manual.

  3. Enter your computer‘s IP address in the Hostname field and 8080 for the Port.

  4. Tap Save to apply the proxy configuration.

On Windows Phone

  1. From the start screen, swipe left to the App List and tap Settings.

  2. Tap WiFi and long-press your connected network. Select Edit.

  3. Tap Show advanced options then Set proxy to Manual.

  4. Enter your computer‘s IP address and 8080 for the port.

  5. Tap Save to connect through the proxy.

And that‘s it! Your mobile device should now route all traffic through mitmproxy for interception.

Step 3: Install the mitmproxy Certificate on Mobile Device

At this point, you‘ll see traffic in the mitmproxy console from your mobile device. However, most apps use HTTPS, so you won‘t be able to view the content.

To decrypt HTTPS traffic, you need to install the mitmproxy certificate as a trusted root certificate authority on your mobile device.

Mitmproxy provides a handy site at http://mitm.it that will generate the certificate for your specific device platform.

Simply go to http://mitm.it on your mobile device‘s browser and click the link for your OS:

mitm.it site

Then install the downloaded certificate on your device:

On iOS

  1. Go to Settings > General > About > Certificate Trust Settings

  2. Enable the mitmproxy certificate

On Android

  1. Save the mitmproxy-ca-cert.pem file somewhere on device storage

  2. Go to Settings > Security > Install from storage

  3. Select the mitmproxy-ca-cert.pem file

On Windows Phone

  1. Go to System > Encryption > Import certificate

  2. Pick the downloaded mitmproxy-ca-cert.crt

You may need to set the mitmproxy certificate as trusted for VPN or apps. Now mitmproxy can intercept even HTTPS-encrypted traffic from the device.

Only enable the certificate when you need to debug traffic and disable when done! Do not expose other app data unnecessarily.

Okay, time for the fun part…let‘s look at how to observe the traffic to reverse engineer and scrape mobile app APIs!

Step 4: Observing and Reversing a Mobile App API

Open the mobile app you want to study on the device configured to use the proxy. For example, I‘ll use the Swiggy food delivery app.

In the mitmproxy console on your computer, you should see requests coming from the IP address of your mobile device.

Filter the view by the domain of the API you want to analyze. For Swiggy, this is prod-api.swiggy.com:

Swiggy API traffic in mitmproxy

As you interact with the mobile app, look for patterns in the API requests. You can expand a request to view full details:

Expanded API request

Testing different app flows reveals what endpoints exist on the backend API and what data they return. For scraping, we‘re interested in GET requests that return JSON data.

Based on observing the traffic from Swiggy, we can see:

  • /restaurants/list/v5 returns a list of restaurants for a location
  • /menu/v4 gets the menu for a specific restaurant
  • /geocode/v1 converts addresses to lat/lng coordinates

And so on. This allows us to understand and map out the API endpoints.

Now we can replicate API requests to extract data. For example, calling /restaurants/list/v5 with the lat and lng parameters returns a JSON list of restaurants:

import requests

api_url = ‘https://prod-api.swiggy.com/restaurants/list/v5‘

params = {
  ‘lat‘: 12.972442, 
  ‘lng‘: 77.580643 
}

response = requests.get(api_url, params=params)
data = response.json() 

for restaurant in data:
  print(restaurant[‘name‘], restaurant[‘area‘])

This prints out names and areas of Swiggy restaurants near a given location. The proxy lets us see how the app functions and reverse engineer the API for scraping.

Scraping Strategies for Mobile App APIs

Once you‘ve inspected an app‘s API traffic to understand endpoints and parameters, you can start harvesting data programmatically.

Here are some best practices:

  • Use proxies – Rotate different residential IPs to avoid blocks from rate limiting.

  • Randomize inputs – Vary geocoordinates, user IDs, etc. to appear more human.

  • Throttle requests – Add delays between requests to limit volume.

  • Cache data – Store responses to avoid duplicate requests.

  • Handle errors – Retry failed requests and gracefully handle HTTP errors.

  • Paginate data – Follow pagination links in responses to extract all data.

  • Use POST when needed – Some actions like placing an order require POST requests.

  • Parse quickly – Extract just the data you need instead of parsing everything.

  • Scrub metadata – Remove unique IDs, timestamps, etc. that could identify records.

  • Stay up to date – Check for API changes after app updates.

With some creativity, you can build scrapers to extract all kinds of valuable data from mobile app APIs. Just make sure to carefully follow Terms of Service and use data responsibly!

Responsible Mobile API Scraping

Like any form of web scraping, collecting data from mobile APIs comes with some ethical considerations:

  • Don‘t overload servers – Limit request volume to minimize impact.

  • Restrict usage – Only collect data you can justify needing.

  • Respect ToS – Avoid violating terms of service or NDAs.

  • Protect data – Store data securely and minimize retention periods.

  • De-identify data – Remove personal information not required for your purpose.

  • Check laws – Some locations regulate types of data collection.

  • Use wisely – Data should provide value, not just be collected because you can.

Transparency and ethics are critical. With great data comes great responsibility!

Conclusion

I hope this guide provided you a comprehensive look at intercepting mobile app data with man-in-the-middle proxies. The key takeaways:

  • MITM proxies allow you to intercept HTTPS traffic from mobile apps by installing the proxy certificate.

  • Tools like mitmproxy make inspecting this traffic easy to understand how an app communicates with API backends.

  • Reverse engineering the API endpoints enables replicating requests to scrape mobile app data.

  • Proxies, throttling, and other techniques can be used to efficiently collect mobile app data at scale.

  • Ensure you scrape mobile APIs ethically and legally.

Mobile applications provide a wealth of data just waiting to be tapped. Now that you know how to use MITM proxies to access it, the possibilities are endless!

What cool apps will you start scraping data from? Let me know if you have any other mobile proxy scraping questions!

Join the conversation

Your email address will not be published. Required fields are marked *