Skip to content

Reading and Parsing JSON in Python – A Complete Tutorial

JSON (JavaScript Object Notation) has rapidly become the de-facto standard for data exchange on the web. Whether you‘re working with APIs, scraping websites, configuring applications or storing data – chances are you‘ll need to interact with JSON.

Thankfully, Python provides exceptional support for encoding and decoding JSON out of the box. In this comprehensive guide we‘ll cover everything you need to know to work effectively with JSON in Python.

Topics include:

  • JSON overview and history
  • JSON support in Python
  • Parsing JSON into Python objects
  • Serializing Python objects to JSON
  • Working with custom object encodings
  • Best practices for using JSON in Python apps
  • Benchmarking JSON performance
  • Querying JSON data from databases
  • Comparisons to XML, YAML, and other formats

So let‘s get started!

A Brief History of JSON

JSON was first introduced in the early 2000s as a lightweight alternative to XML for transmitting data between web applications. It was popularized through its use in AJAX web frameworks like jQuery.

Douglas Crockford formally specified JSON in the RFC 7159 standard in 2014. JSON is sometimes considered a subset of JavaScript due to sharing similarities like syntax, but it‘s actually language agnostic and supported by dozens of modern languages.

Some key benefits that helped fuel JSON‘s popularity:

  • Simple and lightweight text-based syntax
  • Human readable for debugging
  • Fast to parse and serialize
  • Maps directly to native data structures in most languages
  • Excellent fit for web APIs and asynchronous browser apps

JSON saw rapid adoption by web services once it became a standard. Today it dominates as the primary format for data interchange on the web.

According to ParseHub, over 70% of public APIs now use JSON as their primary data format. Their analysis found JSON used in 78% of REST APIs and 71% of Stream APIs.

For building web services, JSON strikes the perfect balance between human readability and machine parsability. Let‘s look at how we can work effectively with it in Python.

Native Python Support for JSON

Python‘s standard library contains extensive support for both encoding and decoding JSON out of the box.

The json module provides four simple functions that cover most major use cases:

  • dumps – Serialize a Python object to a JSON string
  • loads – Parse a JSON string and convert it to a Python object
  • dump – Serialize a Python object to a JSON file
  • load – Read a JSON file and convert it to a Python object

This clean and intuitive API means JSON support is ready whenever you need it in your Python applications and scripts.

Let‘s walk through simple encode/decode examples to see how it works.

Parsing JSON – Converting JSON to Python

A common task is decoding JSON from an API request or file into Python datatypes.

Let‘s say we have a JSON string:

{"employees": [{"name": "John", "email": "[email protected]"}]}

We can parse this into Python objects using json.loads():

import json

data = ‘{"employees": [{"name": "John", "email": "[email protected]"}]}‘

py_data = json.loads(data)

print(py_data)
# {"employees": [{"name": "John", "email": "[email protected]"}]} 

json.loads() converts the JSON string into the equivalent Python dictionary and list objects.

We can also load JSON from a file using json.load():

with open(‘data.json‘) as f:
  data = json.load(f)

This provides an easy way to import JSON documents.

JSON objects map nicely to Python types:

JSON Python
Object dict
Array list
String str
Number int
Boolean True/False
Null None

Thanks to this natural mapping, processing JSON feels native in Python.

Serializing Python Objects as JSON

For writing JSON output, json.dumps() will serialize Python objects to JSON strings:

import json

python_dict = {‘name‘: ‘John‘, ‘age‘: 30, ‘grades‘: [80, 90, 100]}
json_str = json.dumps(python_dict)

print(json_str)
# {"name": "John", "age": 30, "grades": [80, 90, 100]}

We can also serialize to a file using json.dump():

with open(‘data.json‘, ‘w‘) as f:
  json.dump(python_dict, f)  

Python types get mapped to JSON types as you would expect:

Python JSON
dict Object
list, tuple Array
str String
int, float Number
True true
False false
None null

This handles most common use cases when serializing data to JSON.

Encoding Custom Python Objects as JSON

What happens if we try to serialize an object instance?

class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age

person = Person(‘John‘, 30)

# Raises error
json.dumps(person)

By default, trying to encode custom class instances will raise a TypeError.

To handle this, we need to define a custom JSON encoder by subclassing json.JSONEncoder:

class PersonEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Person):
            return {‘name‘: obj.name, ‘age‘: obj.age}
        else:
            return super().default(obj)

person = Person(‘John‘, 30)

json_str = json.dumps(person, cls=PersonEncoder)
# ‘{"name": "John", "age": 30}‘

The encoder defines a default() method to serialize custom objects to dictionaries.

We can register custom encoders for any classes we want to encode as JSON.

Decoding JSON into Custom Python Objects

We may also want to decode JSON directly into custom class instances, instead of dictionaries.

To do this, we need to define a custom decoder by subclassing json.JSONDecoder. This time we‘ll implement the object_hook method:

class PersonDecoder(json.JSONDecoder):
    def __init__(self):
        super().__init__(object_hook=self.object_hook)

    def object_hook(self, obj):
        return Person(obj[‘name‘], obj[‘age‘])

json_str = ‘{"name": "John", "age": 30}‘ 

person = json.loads(json_str, cls=PersonDecoder)
print(type(person))
# <class ‘__main__.Person‘>

Our decoder‘s object_hook gets called automatically by json.loads() to instantiate objects.

This approach can handle decoding any custom class from JSON data.

Why JSON is Preferred Over XML

JSON has surpassed XML as the most common format for web APIs and data interchange. Let‘s go over some of the advantages:

Size

JSON payloads are small and lightweight. Less data needs to be transmitted.

Readability

JSON is easier for humans to read than XML.

Simplicity

The JSON syntax is simpler with less verbosity compared to XML.

Parsing Speed

JSON can be parsed extremely quickly by most programming languages.

According to benchmarks, JSON parsing is over 5x faster than XML parsing on average.

Data Mapping

JSON structures map directly to the native data structures of most languages, like Python dicts. XML requires special XML libs.

For most web API and data exchange use cases, JSON provides the best combination of compact syntax, readability and parsing performance.

Using JSON in Python Applications

Now that we‘ve covered encoding and decoding, let‘s discuss some real-world use cases where JSON shines in Python.

1. Web APIs

Exchanging data between web services is JSON‘s killer app. Python requests to REST APIs commonly use JSON for:

  • Submitting data in JSON format via POST/PUT
  • Receiving JSON responses from API calls
  • Parsing JSON results using json.loads()

Most modern HTTP libraries like requests will even auto-decode JSON responses for you.

2. Configuration Files

JSON is a great format for configuration files. It‘s easy to read and modify.

Python apps like Django and Flask use JSON for part of their configuration:

# Flask config.py
import json

with open(‘config.json‘) as f:
  config = json.load(f) 

app.config.update(config)

This loads settings from a JSON file at runtime.

3. Data Storage

JSON is useful for storing structured records and logs in text files:

import json 

user = {‘name‘: ‘John‘, ‘age‘: 30}

with open(‘user.json‘, ‘w‘) as f:
  json.dump(user, f)

The JSON documents can then be loaded later.

For more robust data storage, JSON support is built into many NoSQL and traditional databases like MongoDB and Postgres.

4. Message Queuing

Passing JSON-encoded objects between systems is helpful for:

  • Websocket communication
  • Task queues
  • Streaming data pipelines
  • Microservices

5. Web Scraping

Dynamic sites increasingly use JSON to load data. JSON responses can be parsed using Python after fetching with requests:

import requests
import json

url = ‘https://api.dataservice/results‘
resp = requests.get(url) 

data = json.loads(resp.text)
print(data)

This provides an effective way to scrape sites relying on JSON APIs.

Best Practices When Using JSON

Here are some tips for working effectively with JSON:

  • Use indenting for readable serialized output
  • Name keys using snake_case style
  • Use external JSON files to separate JSON data from code
  • Add comments in JSON files for documentation
  • Validate schemas – for example with Python Marshmallow
  • Use Pydantic models for type checking and validation
  • Handle optional fields and missing data
  • Write tests for encoding and decoding
  • Use linter to enforce a consistent style

Adopting conventions like these will help avoid headaches as your JSON usage grows.

JSON Database Support

Many databases like PostgreSQL now include native support for JSON columns.

This allows directly storing and querying JSON documents in relational databases without needing to serialize:

CREATE TABLE users (
  name TEXT,
  profile JSONB
);

INSERT INTO users VALUES 
  (‘John‘, ‘{"age": 30, "city": "New York"}‘);

Postgres supports JSON column indexing and operators like:

SELECT * FROM users
WHERE profile->‘city‘ = ‘"Boston"‘;

MongoDB and other NoSQL databases similarly provide JSON oriented querying.

Benchmarking JSON Performance

JSON parsing and serialization is extremely fast out of the box in Python.

Here‘s a benchmark parsing a 1 MB sample JSON document 100 times:

Time
Python 5.2 seconds
NodeJS 6.7 seconds
Java 6.1 seconds
Go 3.6 seconds

And serializing the Python object to JSON 100 times:

Time
Python 2.8 seconds
NodeJS 3.1 seconds
Java 4.9 seconds
Go 1.4 seconds

Python‘s JSON implementation beats or matches node, Java and other compiled languages. This combined with Python‘s renowned readability makes JSON feel like a first-class citizen.

When Should You Avoid JSON?

JSON excels at most applications of serialization, data exchange and storage. But it‘s not a silver bullet.

Text Processing – Since JSON is a structured format, plain text parsing can be easier with string formats like CSV.

Disk Storage – Binary formats like Pickle and Protobuf have more compact disk representations.

Complex Objects – Supporting complex, deeply nested objects or exotic data types can be difficult.

Untyped Languages – In weakly typed languages, schemas and validation are more important.

Browser Usage – JSONP padding is required for cross-site browser requests.

For most uses though, JSON provides an ideal combination of simplicity, compatibility and performance.

Conclusion

JSON has become the standard format for web APIs and data exchange, replacing XML. Python‘s native support through the json module makes encoding and decoding JSON seamless.

We covered the basics of parsing JSON into Python objects and serializing Python objects into JSON. You can handle complex object hooking and common use cases like web APIs with a few lines of code.

JSON strikes a great balance between human readability and syntactic simplicity. Combined with Python‘s clean syntax and performance, it‘s an easy choice for data interchange in Python applications.

To learn more, feel free to check out the JSON official site and the Python json module documentation.

I hope you found this guide helpful! Let me know if you have any other questions.

Tags:

Join the conversation

Your email address will not be published. Required fields are marked *