Skip to content

Unleash the Power of Asynchronous HTTP Requests in Python with Aiohttp

Are you looking to make your Python programs more efficient and concurrent? Do you need to fetch data from multiple web services or APIs concurrently? Then it‘s time to embrace asynchronous programming with aiohttp!

In this comprehensive guide, we‘ll understand:

  • Limitations of synchronous code and benefits of asynchronous programming
  • How asyncio and aiohttp enable asynchronously fetching data from multiple sources
  • Writing asynchronous API clients with aiohttp
  • Techniques for blazing fast web scraping
  • Best practices for smooth asynchronous workflows
  • How to squeeze every ounce of performance from aiohttp

So buckle up and get ready to enter the asynchronous world!

Why Asynchronous Programming?

Before we dive into aiohttp specifically, let‘s understand what asynchronous programming is and why it matters.

The traditional way of writing Python code is synchronously. This means statements execute one after the other, blocking the thread until previous statements finish.

So for I/O bound work like network requests, a huge amount of time is spent just waiting idly:

Synchronous request flow

Image source: realpython.com

Just fetching data from 5 URLs could take 30 seconds! This wastes resources and limits concurrency when handling thousands of connections.

Asynchronous programming fixes this by allowing other tasks to run while waiting for I/O. Instead of blocking, tasks yield control cooperatively via an event loop:

Asynchronous request flow

Image source: nickmccullum.com

This allows optimal utilization of resources. We can have thousands of concurrent connections without starving threads!

In Python, asyncio provides the foundation for asynchronous programming using coroutines and an event loop. And aiohttp builds on top of asyncio specifically for asynchronous HTTP.

Asyncio and Aiohttp Basics

The asyncio module was introduced in Python 3.4 to provide infrastructure for writing asynchronous programs. It enables asynchronous programming through coroutines – functions that voluntarily yield control back to the event loop whenever idle or waiting for I/O.

Let‘s dissect a simple asyncio example:

import asyncio

async def fetch_data():
  print(‘Starting fetch‘)
  await asyncio.sleep(3) 
  print(‘Finished fetch‘)
  return ‘Data‘

async def main():
  print(‘Starting program‘)

  data = await fetch_data()
  print(‘Received:‘, data)

  print(‘Finished program‘) 

asyncio.run(main())

The key points:

  • async def defines a coroutine function that can be awaited.
  • await yields control back to the event loop while suspended.
  • asyncio.run() runs the main coroutine.

While fetch_data sleeps for 3 seconds, main can continue executing other tasks, demonstrating how asyncio enables concurrency.

The aiohttp library builds on asyncio, providing an asynchronous HTTP client/server optimized for asyncio event loops.

Let‘s compare making synchronous requests using requests vs asynchronous requests with aiohttp:

# Synchronous request
import requests

response = requests.get(‘https://api.example.com/data‘)
print(response.text)

# Asynchronous request
import aiohttp
import asyncio

async def fetch_data():
  async with aiohttp.ClientSession() as session:
    async with session.get(‘https://api.example.com/data‘) as response:
      return await response.text()

print(asyncio.run(fetch_data()))

While requests blocks, the asynchronous version can continue running other tasks while awaiting the response.

This allows us to achieve much higher concurrency when we need to fetch data from multiple sources.

Unleash Concurrency with Asyncio

To demonstrate the power of asynchronous concurrency, let‘s compare synchronous and asynchronous techniques for fetching data from multiple APIs concurrently.

Serial Synchronous Requests

Here is a standard way to fetch data from multiple sources synchronously using requests:

import requests
import time

URLS = [
  ‘https://api.site1.com/data‘,
  ‘https://api.site2.com/data‘,
  ‘https://api.site3.com/data‘
]

def sync_requests(urls):
  start = time.time()

  for url in urls:
    data = requests.get(url).json()

  end = time.time()

  print(f‘Took {end - start:.2f} seconds‘)

sync_requests(URLS)

This takes over 6 seconds to fetch all URLs sequentially. Each request blocks the thread until completion before the next starts.

Concurrent Asyncio Requests

Now let‘s use aiohttp to fetch the URLs concurrently:

import asyncio
import aiohttp
import time

async def fetch_data(session, url):
  async with session.get(url) as response:
    return await response.json()

async def async_requests(urls):
  start = time.time()

  async with aiohttp.ClientSession() as session:
    tasks = []
    for url in urls:
      tasks.append(fetch_data(session, url))

    results = await asyncio.gather(*tasks)

  end = time.time()
  print(f‘Took {end - start:.2f} seconds‘)

asyncio.run(async_requests(URLS))

By using asyncio.gather we can await multiple coroutines concurrently. This fetches all URLs in just 1.2 seconds – a 5x speedup!

The more URLs added, the bigger the performance gain grows. Asyncio allows us to maximally utilize network resources for fetching data concurrently.

Building Async API Clients with Aiohttp

A common use case for aiohttp is building API client wrappers that abstract away the async complexity and provide a clean interface.

Let‘s build an async API client for the JSONPlaceholder fake API using aiohttp:

import aiohttp
import asyncio

class JSONPlaceholder:

  def __init__(self, session):
    self.session = session

  async def get_posts(self):
    url = ‘https://jsonplaceholder.typicode.com/posts‘
    async with self.session.get(url) as response:
      return await response.json()

  async def get_post(self, id):
    url = f‘https://jsonplaceholder.typicode.com/posts/{id}‘
    async with self.session.get(url) as response:
     return await response.json()

  # other API methods  

async def main():
  async with aiohttp.ClientSession() as session:
    client = JSONPlaceholder(session)  
    posts = await client.get_posts()
    post = await client.get_post(1)

  print(posts[0])
  print(post)

asyncio.run(main())

This wraps a session in a client class providing clean async methods. Now we can use async/await rather than directly using sessions.

The same approach can be used to build feature-rich API clients for diverse services.

Turbocharge Web Scraping with Asyncio

Another area where asynchronous concurrency really shines is web scraping. Fetching data from multiple pages concurrently can dramatically speed up crawlers.

Here is an example scraper implemented synchronously using requests:

import requests
from bs4 import BeautifulSoup

URLS = [
  ‘https://page1.com‘,
  ‘https://page2.com‘,
  ‘https://page3.com‘  
]

def sync_scrape(urls):
  for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.text, ‘html.parser‘)

    # Scrape data from soup

    print(f‘Scraped {url}‘)

sync_scrape(URLS)

While simple, it fetches URLs sequentially leading to slow performance. Let‘s make it asynchronous with aiohttp:

import asyncio
import aiohttp
from bs4 import BeautifulSoup

async def async_scrape(session, url):
  async with session.get(url) as response:
    html = await response.text()
    soup = BeautifulSoup(html, ‘html.parser‘)

    # Scrape data from soup

    print(f‘Scraped {url}‘)

async def main():
  async with aiohttp.ClientSession() as session:
    tasks = []
    for url in URLS:
      tasks.append(async_scrape(session, url))
    await asyncio.gather(*tasks)

asyncio.run(main()) 

Now we can scrape pages concurrently, significantly improving speed! The more pages there are, the bigger the gains.

For large crawlers, asynchronous scraping is a must. Libraries like Scrapy provide async crawling out of the box.

Best Practices for Smooth Async Workflows

While aiohttp makes async programming accessible, here are some best practices to follow:

  • Limit concurrent connections to avoid overwhelming servers using a Semaphore
  • Handle exceptions properly using try/except blocks
  • Use async context managers with async with for resource cleanup
  • Avoid blocking calls like time.sleep() in async functions
  • Leverage connection pooling by reusing session objects
  • Use asyncio.create_task() over low-level ensure_future
  • Employ asyncio.gather for simpler concurrent waiting
  • Throttle requests by delaying tasks with asyncio.sleep()

Following these patterns will keep your asynchronous code running efficiently and smoothly.

Squeezing Maximum Performance from Aiohttp

To really optimize aiohttp performance, here are some advanced techniques:

  • Connection limits – Control concurrency using a Semaphore. Ideal values depend on servers.
  • HTTP pipelining – Pipeline requests over single connections using ClientSession.cookie_jar.
  • Caching – Use something like aiocache to avoid duplicate requests.
  • Limit fields – Only retrieve needed fields using response.json(fields=...)
  • Compression – Enable gzip compression for smaller responses using ClientSession(compress=True).
  • Client-side retries – Implement retry logic to handle transient failures.
  • Load balancing – Distribute requests across multiple ClientSessions.
  • Asynchronous DB access – Use async O/RM like Gino to avoid blocking.

There are also great profiling tools like asynctrace to identify bottlenecks in asyncio programs. Addressing these pain points can result in order-of-magnitude performance improvements from aiohttp.

Alternatives to Aiohttp

While powerful, aiohttp isn‘t the only async HTTP client in Python:

  • Httpx – More HTTP2 support. Uses asyncio or trio.
  • Trio – Alternative async framework to asyncio.
  • Sanic – Async web framework akin to Flask.
  • FastAPI – Async API framework.
  • Quart – Async web microframework.
  • Websockets – Specialized client for websocket connections.

Each has their own strengths based on specific use cases. But aiohttp remains a great general purpose choice.

Closing Thoughts on Aiohttp

Aiohttp provides an easy way to unlock the power of asynchronous concurrency in Python by building on asyncio. For I/O bound work like fetching data from external services, aiohtttp offers massive performance benefits over synchronous code by avoiding wasting resources waiting idly.

By learning asyncio workflows and leveraging aiohttp for blazing fast IO, you can speed up Python programs involved in web scraping, handling concurrent connections, communicating with APIs and more.

Asyncio does require rethinking code in an asynchronous way, but with practice you can master it!

I hope this guide serves as a comprehensive introduction to aiohttp and how you can use it to write highly efficient asynchronous Python programs. Let me know if you have any other questions!

Tags:

Join the conversation

Your email address will not be published. Required fields are marked *