Skip to content

The Complete Guide to Caching in Python

Caching is a critical optimization technique that every Python developer should have in their toolbox. Adding caching to your Python application can provide tremendous improvements in performance, scalability, and user experience.

In this comprehensive guide, you‘ll learn all about caching – what it is, why it matters, different caching strategies, and how to implement caching in Python. We‘ll cover both basic and advanced caching techniques.

Let‘s get started!

What is Caching and Why Does it Matter?

Caching involves temporarily storing data in a fast lookup table so that future requests can retrieve this data quickly without having to recalculate or re-fetch it. The cached data serves as a proxy for the original data source.

Some key benefits caching provides include:

  • Faster access to frequently used data – By keeping frequently accessed data in cache, you can avoid slower lookups from the original data source like a database, API, filesystem etc. This is the primary performance boost caching aims to provide.
  • Reduced load on backend services – Caching reduces requests to backend services since data is served from cache instead. This allows those services to scale better under high load.
  • Improved user experience – Users get lower latency and faster response times since data fetches are faster. This leads to snappier UI interactions.

As per a Cloudflare study, enabling caching provided:

  • 65% decrease in latency
  • 28% improvement in page load times

So intelligent caching strategies are critical for building high-performance Python applications. Let‘s look at various approaches.

Caching Strategies

There are several different caching algorithms or eviction policies that determine what gets removed from cache when it becomes full:

Least Recently Used (LRU)

The Least Recently Used items are evicted first. This takes advantage of temporal locality – the trend that the same items are frequently accessed together over short periods of time.

LRU caching is commonly used for caching database queries, API requests etc. Python‘s lru_cache implements LRU based eviction.

Least Frequently Used (LFU)

Items used least frequently are evicted first. LFU works well when some queries are consistently more popular than others over the long run.

Implementing LFU caching requires keeping track of frequency counts of each item. This adds overhead compared to LRU.

First In First Out (FIFO)

The first items added to cache are evicted first. This is useful when the order of item access matters, like in message queues.

FIFO is simple to implement but recency of item access isn‘t considered like in LRU.

Most Recently Used (MRU)

The most recently accessed items are evicted first. This is based on the assumption that recently accessed items are less likely to be accessed again soon compared to older items.

MRU is useful for workloads where older data has a higher cache hit rate than newer data.

Here is a comparison of cache hit rates for different policies with an example access sequence:
As you can see, LRU has the highest cache hit rate for typical workloads because of its ability to exploit temporal locality.

Implementing Caching in Python

There are several ways to add caching to your Python application. Let‘s go through them with examples.

Manual Caching with Decorators

You can create a custom cache and decorator to wrap functions with caching logic.

Here is a simple memoize decorator for caching function results:

cache = {}

def memoize(func):
  def wrapper(*args): 
    if args not in cache:
      cache[args] = func(*args)
    return cache[args]
  return wrapper

def sum_squares(a, b):
  return a**2 + b**2 

The memoize decorator checks if the arguments passed to sum_squares exists in the cache dict. If so, it returns the cached value directly. Else, it runs the function, stores result in cache, and returns output.

This manual caching approach provides more customization of cache policies compared to out-of-the-box solutions. But the logic needs to be implemented manually.

You can also use functools.lru_cache which provides a decorator implementing LRU caching for you.

Database Query Caching

For caching database queries, most DBs provide built-in support for caching like psycopg‘s cache module for Postgres.

These integrate closely with the DB to cache full query results as well as query plans.

Web Caching

For web apps, caches can be added at multiple levels:

  • Browser caching – Expiry times can be set on static resources to cache them client-side.
  • CDN caching – CDNs act as a cache layer and reduce traffic to the origin server.
  • Reverse proxy caching – Tools like Varnish cache entire page outputs at the edge.
  • App server caching – Web frameworks like Django support per-view caching to cache page fragments.

Distributed Caching

For large scale caching across servers, distributed caches like Redis and Memcached offer very high performance and throughput. But they require more ops expertise to manage.

Caching full HTTP responses in Redis:

import redis

r = redis.Redis(host=‘redis_server‘, port=6379, db=0)

def get_user(user_id):
  user = r.get(f‘user:{user_id}‘) # Check cache first
  if user is None:
    user = db.query_user(user_id) 
    r.set(f‘user:{user_id}‘, user) # Cache db result
  return user

Async Caching

Asynchronous caching helps minimize blocking when populating cache since the requests can happen in parallel.

Here is async cache population using aiocache:

import asyncio
from aiocache import Cache

cache = Cache(Cache.REDIS)

async def fetch_user(id):
  return await db.execute(f‘SELECT * FROM users WHERE id={id}‘)

async def main():
  user_ids = [1, 2, 3]

  # Concurrently fetch users from db 
  cached_users = await asyncio.gather(*[cache.get(f‘user:{id}‘) for id in user_ids])

  # Populate cache with missing users
  missing_ids = [id for id, user in zip(user_ids, cached_users) if user is None]

  fresh_users = await asyncio.gather(*[fetch_user(id) for id in missing_ids])

  await asyncio.wait([cache.set(f‘user:{id}‘, user) for id, user in zip(missing_ids, fresh_users)])

So async caching helps minimize blocking for cache misses.

Memoization in Machine Learning

Caching can also speed up machine learning workloads. Models can be expensive to train, so caching model parameters improves performance.

Here is an example using sklearn-contrib-py-earth:

from import Earth

# Train model
model = Earth(), y_train)

# Serialize model to cache
cached_model = pickle.dumps(model)

# Restore for prediction
model = pickle.loads(cached_model)

So caching avoids re-training models when the same model is re-used repeatedly.

Caching Best Practices

Here are some tips for applying caching most effectively:

  • Profile to find the most expensive bottlenecks before caching. Don‘t prematurely optimize functions that aren‘t that slow.
  • Choose an optimal eviction policy based on access patterns – LRU, LFU etc. LRU fits most use cases.
  • Set appropriate cache expiry times based on how often data changes. Strike a balance between freshness and performance.
  • Use cache priming during startup to populate cache proactively.
  • Enable compression to store more data in memory and reduce bandwidth.
  • Monitor your cache hit rate as a metric for efficiency. A low hit rate means inefficient caching.
  • Distribute load across multiple cache servers to scale better under high loads.


Adding intelligent caching makes your Python application faster and more efficient by avoiding repetitive computations and IO. Both basic and advanced caching techniques like distributed caches and async caching can help.

Follow best practices like profiling bottlenecks first, choosing optimal eviction policies, setting optimal expiry and monitoring cache efficiency. Used judiciously, caching can massively improve your Python application‘s speed and scalability.

Join the conversation

Your email address will not be published. Required fields are marked *