Skip to content

how to use Python‘s json module to parse, encode, and manipulate JSON data

JSON, short for JavaScript Object Notation, has become the de facto format for transmitting data between web servers and browsers and in APIs. Its popularity stems from the fact that it is lightweight, human-readable, and can be parsed by pretty much any modern programming language.

As a Python developer, you will undoubtedly need to work with JSON data at some point. Luckily, Python provides excellent built-in support for encoding and decoding JSON through its json module. In this tutorial, you‘ll learn how to effectively use this module to read, write, and manipulate JSON data.

What is JSON?

Before diving into the technical details, let‘s define what JSON actually is. JSON is a text-based data format that is used to represent structured data. It is derived from the JavaScript programming language, but is language-independent.

JSON builds universal data structures using two structures:

  • A collection of name/value pairs (similar to a Python dictionary)
  • An ordered list of values (similar to a Python list)
    These universal data structures can be nested within each other allowing complex hierarchical data to be represented.

Here‘s an example of JSON data describing a person:


{
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021"
},
"phoneNumber": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "fax",
"number": "646 555-4567"
}
] }

As you can see, JSON syntax uses curly braces {} to define objects, square brackets [] to define arrays, and key-value pairs separated by commas. Strings are always wrapped in double quotes. This simple yet flexible structure allows JSON to represent complex nested data.

Now that you understand what JSON looks like, let‘s explore how to work with it in Python.

Parsing JSON Strings

The most common way you‘ll encounter JSON data in Python is as a string. For example, when you make a request to a web API, the response will typically be a JSON-formatted string.

Python‘s json module provides a handy method called loads() (short for “load string”) that can parse a JSON string into a Python object. Here‘s how you use it:


import json

json_string = ‘{“name”: “John”, “age”: 30, “city”: “New York”}‘

data = json.loads(json_string)

print(data)

print(type(data))

 

As you can see, json.loads() parsed the JSON string into a Python dictionary. The JSON object‘s keys became the dictionary‘s keys, and the values became the dictionary‘s values.

One thing to keep in mind is that JSON only supports a subset of Python‘s built-in types. When you decode JSON data, it will be converted to the equivalent Python type according to this table:

JSON Python
object dict
array list
string str
number (int) int
number (real) float
true True
false False
null None

Loading JSON from a File

In addition to parsing JSON from strings, you‘ll often need to read JSON data from files. The json module provides the load() method for this.

Suppose you have a file named data.json with this content:

{
"name": "John Smith",
"age": 30,
"city": "New York"
}

You can load the JSON data from the file like this:


import json

with open(‘data.json‘) as file:
data = json.load(file)

print(data)

 

The load() method reads the file, parses the JSON data, and returns the resulting Python object. Just like with loads(), the JSON data is converted to the equivalent built-in Python types.

Encoding Python Objects as JSON

In addition to decoding JSON data, the json module also allows you to encode Python objects into the JSON format. This is useful when you need to send data from your Python program to another system that expects JSON.

The primary method for encoding data is dumps() (for “dump string”):


import json

data = {
“name”: “John Smith”,
“age”: 30,
“city”: “New York”
}

json_string = json.dumps(data)
print(json_string)

print(type(json_string))

 

The dumps() method takes a Python object and returns a JSON-formatted string representation of it. The encoding process converts Python types to their JSON equivalents according to this table:

Python JSON
dict object
list, tuple array
str string
int, float, int- & float-derived Enums number
True true
False false
None null

An important gotcha to watch out for is that when you encode and then decode an object, the resulting object may have different types than the original. For example, JSON does not distinguish between lists and tuples – they both become arrays after encoding. When decoded back to Python, JSON arrays always become lists.

Writing JSON to a File

Just as you can load JSON data from a file, you can also write Python objects as JSON to a file using the dump() method:


import json

data = {
“name”: “John Smith”,
“age”: 30,
“city”: “New York”
}

with open(‘output.json‘, ‘w‘) as file:
json.dump(data, file)

This code writes the data dictionary as JSON to a file named output.json. The resulting file will contain:


{"name": "John Smith", "age": 30, "city": "New York"}

Encoding Custom Python Objects

One limitation of the json module is that it only knows how to encode Python‘s built-in types by default. If you try to encode a custom object, you‘ll get a TypeError:


import json

class User:
def init(self, name, age):
self.name = name
self.age = age

user = User(“John”, 30)
json.dumps(user)

 

To fix this, you need to provide a custom encoding function that converts your object to a JSON-serializable form. One way to do this is to define a default() method in a custom subclass of JSONEncoder:


import json

class User:
def init(self, name, age):
self.name = name
self.age = age

class UserEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, User):
return {‘name‘: obj.name, ‘age‘: obj.age}
return super().default(obj)

user = User(“John”, 30)
json_string = json.dumps(user, cls=UserEncoder)
print(json_string)

 

The default() method is called for any object that the json module doesn‘t know how to serialize. In this case, it checks if the object is an instance of the User class. If so, it returns a dictionary representation of the object that can be serialized. For any other types, it falls back to the default behavior by calling the superclass method.

When calling dumps(), you pass the UserEncoder class as the cls argument. This tells the json module to use your custom encoder class.

Conclusion

In this tutorial, you learned how to use Python‘s json module to parse, encode, and manipulate JSON data. You saw how to:

  • Parse JSON strings into Python objects with loads()
  • Load JSON data from files with load()
  • Encode Python objects into JSON strings with dumps()
  • Write JSON data to files with dump()
  • Handle custom object encoding by subclassing JSONEncoder

These skills form a strong foundation for working with JSON data in your Python programs. Whether you‘re interacting with web APIs, saving configuration files, or exchanging data between systems, the json module has you covered.

It‘s worth noting that the json module isn‘t the only way to serialize data in Python. The pickle and marshal modules offer similar functionality for Python-specific data serialization. However, JSON has the advantage of being a universal standard that can be read by most programming languages.

As you work more with JSON and Python, you may encounter more advanced use cases like customizing the decoding process, pretty-printing JSON output, or working with JSON Web Tokens (JWTs). The skills you learned in this tutorial will serve as a solid basis for tackling those more advanced topics.

Remember, practice is key to mastering any new skill. Try incorporating JSON into your own projects, whether it‘s saving user preferences, caching API responses, or communicating between microservices. The more you work with JSON, the more natural it will feel.

Happy coding!

Join the conversation

Your email address will not be published. Required fields are marked *