Skip to content

How to Parse JSON with Python like a Pro

Hey friend! JSON is everywhere these days. Whether you‘re building web services, scraping data, or interacting with APIs, odds are you‘ll need to parse JSON in Python at some point.

From my 5+ years of experience, I‘ve found Python has amazing tools for handling JSON. So I wanted to share some pro tips and code snippets that will help you parse, manipulate, and analyze JSON data like a boss!

JSON Format Refresher

Let‘s kick things off with a quick refresher on JSON syntax and structure.

JSON stands for JavaScript Object Notation. It‘s emerged as the universal standard for serializing and transmitting data over the web.

According to Statista, over 70% of developers work with JSON APIs regularly.

JSON represents data as key/value pairs:

{
  "name": "John",
  "age": 30,
  "address": {
    "street": "123 Main St",
    "city": "San Francisco" 
  }
}

It‘s composed of:

  • Objects – Unordered collections of key/value pairs denoted by { }
  • Arrays – Ordered collections denoted by [ ]
  • Key – Always a string
  • Value – Can be a string, number, boolean, null, object, or array

This simple syntax makes JSON easy to read and parse, both for humans and machines.

Now let‘s dive into techniques for parsing JSON in Python!

Parsing JSON Strings

The most common JSON parsing task is taking a raw JSON string and converting it into a Python dict.

Python‘s built-in json module provides simple methods for handling this:

json.loads() – Parses a JSON string, returning a Python object

import json

json_str = ‘{"name": "John", "age": 30}‘
data = json.loads(json_str)

print(data[‘name‘]) # John

json.dumps() – Serializes a Python object into a JSON string

data = {
  ‘name‘: ‘John‘,
  ‘age‘: 30
}

json_str = json.dumps(data) 

print(json_str) 
# {"name": "John", "age": 30}

The json.loads() method parses JSON and converts it into native Python data structures. It allows effortless serialization from string to dict, list, str, int etc.

However, it will throw a json.decoder.JSONDecodeError if the JSON is invalid. So make sure to wrap it in try/except.

The json.dumps() method does the reverse, encoding Python objects as JSON strings. This is useful for outputting data or transmitting it to other systems.

Reading and Parsing JSON Files

In addition to strings, we can use the json module to load and parse JSON files.

This JSON file data.json contains employee data:

{
  "employees": [
    { "name": "John", "email": "[email protected]"},
    { "name": "Jane", "email": "[email protected]"}
  ]
}

To load and parse it in Python:

import json

with open(‘data.json‘) as f:
  data = json.load(f)

for employee in data[‘employees‘]:
  print(employee[‘name‘]) 
# John 
# Jane

We open the file, pass it to json.load(), and get back a Python object containing the parsed JSON.

In this case it‘s a dictionary with an employees key containing a list of employee objects.

The key thing is json.load() handled reading the file, parsing the JSON, and giving us native Python objects we can immediately work with!

Pretty Printing JSON

Here‘s a handy trick for debugging: formatting JSON in a readable way using Python‘s json.dumps() method.

Pass the indent parameter to format it with newlines and spaces:

data = {
  ‘name‘: ‘John‘,
  ‘age‘: 30,
  ‘address‘: {
    ‘street‘: ‘123 Main St‘,
    ‘city‘: ‘San Francisco‘
  }
}

print(json.dumps(data, indent=4))

Output:

{
    "name": "John",
    "age": 30,
    "address": {
        "street": "123 Main St",
        "city": "San Francisco"
    }
}

The indent controls the number of spaces. This prints it out with nice formatting so it‘s easier to visually parse.

Highly recommend using json.dumps(indent=4) whenever printing JSON in the terminal for better visibility.

Parsing Nested JSON Objects

Real-world JSON often has nested objects and arrays. This can be tricky to wrangle in Python.

Luckily, pandas provides an awesome utility called json_normalize() that flattens nested JSON into a flat table:

from pandas.io.json import json_normalize

data = {
  "cars": [
    {"make": "Ford", "models": ["Fiesta", "Focus", "Mustang"]},
    {"make": "BMW", "models": ["320", "X3", "X5"]},
    {"make": "Fiat", "models": ["500", "Panda"]}
  ]
}

df = json_normalize(data, ‘cars‘, [‘make‘, ‘models‘])

print(df)

Output:

makemodels
0FordFiesta
1FordFocus
2FordMustang
3BMW320
4BMWX3
5BMWX5
6Fiat500
7FiatPanda

It flattens the nested "cars" array into rows, while keeping the make and models columns.

This is hugely valuable when you need to analyze nested JSON data in pandas!

Converting JSON to CSV

For analysis in Excel, Tableau, or other tools you may want to export JSON to CSV format.

No problem, the pandas DataFrame.to_csv() method makes this a breeze:

df = json_normalize(data) 

df.to_csv(‘data.csv‘, index=False)

By passing index=False it will omit the pandas index column.

You can also append JSON data to an existing CSV:

df.to_csv(‘data.csv‘, mode=‘a‘, header=False, index=False)

This provides a clean way to extract JSON -> CSV for downstream consumption.

Handling Duplicates

Unlike Python dictionaries, JSON technically allows duplicate keys like:

{
  "name": "John",
  "age": 30,
  "name": "Jane"
}

This will throw a TypeError in json.loads() since Python dicts require unique keys.

To handle duplicates, pass the object_pairs_hook parameter:

from collections import defaultdict

def handle_dups(pairs):
  d = defaultdict(list)
  for k, v in pairs:
    d[k].append(v)

  return dict(d)

data = json.loads(json_str, object_pairs_hook=handle_dups) 

This collects values from duplicate keys into lists:

{
  "name": ["John","Jane"],
  "age": 30
} 

An alternative is allowing later duplicates to override earlier ones:

def handle_dups(d):
  return {k: v for k, v in d.items()}

data = json.loads(json_str, object_hook=handle_dups)

So object_pairs_hook gives you options for handling JSON objects with duplicate keys.

Parsing JSON Dates

Dealing with dates is messy. JSON represents dates as strings:

{
  "date": "2019-01-01" 
}

By default json.loads() will parse this into a str.

We can use object_hook to auto-convert dates:

from datetime import datetime

def parse_dates(data):
  for k, v in data.items():
    if isinstance(v, str):
        try:
            data[k] = datetime.strptime(v, "%Y-%m-%d") 
        except ValueError:
            pass

  return data

data = json.loads(json_str, object_hook=parse_dates)
print(data[‘date‘].year) # 2019

Now date strings are parsed into datetime objects automatically!

Incremental JSON Parsing

Parsing huge JSON files can exhaust memory. The json module has streaming parser JSONDecoder to handle this:

import json

with open(‘big.json‘) as f:
  parser = json.JSONDecoder()
  for obj in parser.decode(f.read()): 
    print(obj)

It incrementally parses the JSON yielding Python objects as it goes.

For JSON arrays, you can also pass chunk_size to json.load():

with open(‘big.json‘) as f:
  for obj in json.load(f, chunk_size=1000):
    print(obj)

It loads the JSON array in chunks of 1000 elements. Great for huge datasets!

Validating JSON Schemas

When accepting JSON data, validation helps catch issues early.

The jsonschema module lets you define a schema and validate data against it:

from jsonschema import validate

schema = {
  "type": "object",
  "required": ["name", "email"],
  "properties": {
    "name": {"type": "string"},
    "email": {"type": "string"} 
  }
}

data = {"name": "John"}

try:
  validate(data, schema)
except Exception as e:
  print(e) # ‘email‘ is a required property

This checks for required fields, expected types, etc. Critical for robust JSON APIs.

Which Parser Should I Use?

Python has a few options for parsing JSON:

  • json – The standard library JSON module. Great balance of speed, compatibility, and options.
  • simplejson – External module focused on performance. Up to 10x faster than json.
  • pandas – pd.read_json() and json_normalize() excel at turning JSON into DataFrames.

My recommendation:

  • json – Best for general parsing tasks like reading files or simple string parsing.
  • pandas – If JSON → DataFrame is your end goal, pandas can‘t be beat.
  • simplejson – For performance-critical applications dealing with lots of JSON.

So in summary, Python makes parsing JSON a joy whether it‘s simple strings or huge datasets. The json and pandas modules provide all the tools you‘ll need.

I hope these tips help you become a JSON parsing ninja! Let me know if you have any other questions.

Join the conversation

Your email address will not be published. Required fields are marked *