JSON, short for JavaScript Object Notation, has become the de facto format for transmitting data between web servers and browsers and in APIs. Its popularity stems from the fact that it is lightweight, human-readable, and can be parsed by pretty much any modern programming language.
As a Python developer, you will undoubtedly need to work with JSON data at some point. Luckily, Python provides excellent built-in support for encoding and decoding JSON through its json module. In this tutorial, you‘ll learn how to effectively use this module to read, write, and manipulate JSON data.
What is JSON?
Before diving into the technical details, let‘s define what JSON actually is. JSON is a text-based data format that is used to represent structured data. It is derived from the JavaScript programming language, but is language-independent.
JSON builds universal data structures using two structures:
- A collection of name/value pairs (similar to a Python dictionary)
- An ordered list of values (similar to a Python list)
These universal data structures can be nested within each other allowing complex hierarchical data to be represented.
Here‘s an example of JSON data describing a person:
{
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021"
},
"phoneNumber": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "fax",
"number": "646 555-4567"
}
]
}
As you can see, JSON syntax uses curly braces {}
to define objects, square brackets []
to define arrays, and key-value pairs separated by commas. Strings are always wrapped in double quotes. This simple yet flexible structure allows JSON to represent complex nested data.
Now that you understand what JSON looks like, let‘s explore how to work with it in Python.
Parsing JSON Strings
The most common way you‘ll encounter JSON data in Python is as a string. For example, when you make a request to a web API, the response will typically be a JSON-formatted string.
Python‘s json module provides a handy method called loads()
(short for “load string”) that can parse a JSON string into a Python object. Here‘s how you use it:
import json
json_string = ‘{“name”: “John”, “age”: 30, “city”: “New York”}‘
data = json.loads(json_string)
print(data)
print(type(data))
As you can see, json.loads()
parsed the JSON string into a Python dictionary. The JSON object‘s keys became the dictionary‘s keys, and the values became the dictionary‘s values.
One thing to keep in mind is that JSON only supports a subset of Python‘s built-in types. When you decode JSON data, it will be converted to the equivalent Python type according to this table:
JSON | Python |
---|---|
object | dict |
array | list |
string | str |
number (int) | int |
number (real) | float |
true | True |
false | False |
null | None |
Loading JSON from a File
In addition to parsing JSON from strings, you‘ll often need to read JSON data from files. The json module provides the load()
method for this.
Suppose you have a file named data.json
with this content:
{
"name": "John Smith",
"age": 30,
"city": "New York"
}
You can load the JSON data from the file like this:
import json
with open(‘data.json‘) as file:
data = json.load(file)
print(data)
The load()
method reads the file, parses the JSON data, and returns the resulting Python object. Just like with loads()
, the JSON data is converted to the equivalent built-in Python types.
Encoding Python Objects as JSON
In addition to decoding JSON data, the json module also allows you to encode Python objects into the JSON format. This is useful when you need to send data from your Python program to another system that expects JSON.
The primary method for encoding data is dumps()
(for “dump string”):
import json
data = {
“name”: “John Smith”,
“age”: 30,
“city”: “New York”
}
json_string = json.dumps(data)
print(json_string)
print(type(json_string))
The dumps()
method takes a Python object and returns a JSON-formatted string representation of it. The encoding process converts Python types to their JSON equivalents according to this table:
Python | JSON |
---|---|
dict | object |
list, tuple | array |
str | string |
int, float, int- & float-derived Enums | number |
True | true |
False | false |
None | null |
An important gotcha to watch out for is that when you encode and then decode an object, the resulting object may have different types than the original. For example, JSON does not distinguish between lists and tuples – they both become arrays after encoding. When decoded back to Python, JSON arrays always become lists.
Writing JSON to a File
Just as you can load JSON data from a file, you can also write Python objects as JSON to a file using the dump()
method:
import json
data = {
“name”: “John Smith”,
“age”: 30,
“city”: “New York”
}
with open(‘output.json‘, ‘w‘) as file:
json.dump(data, file)
This code writes the data
dictionary as JSON to a file named output.json
. The resulting file will contain:
{"name": "John Smith", "age": 30, "city": "New York"}
Encoding Custom Python Objects
One limitation of the json module is that it only knows how to encode Python‘s built-in types by default. If you try to encode a custom object, you‘ll get a TypeError
:
import json
class User:
def init(self, name, age):
self.name = name
self.age = age
user = User(“John”, 30)
json.dumps(user)
To fix this, you need to provide a custom encoding function that converts your object to a JSON-serializable form. One way to do this is to define a default()
method in a custom subclass of JSONEncoder
:
import json
class User:
def init(self, name, age):
self.name = name
self.age = age
class UserEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, User):
return {‘name‘: obj.name, ‘age‘: obj.age}
return super().default(obj)
user = User(“John”, 30)
json_string = json.dumps(user, cls=UserEncoder)
print(json_string)
The default()
method is called for any object that the json module doesn‘t know how to serialize. In this case, it checks if the object is an instance of the User
class. If so, it returns a dictionary representation of the object that can be serialized. For any other types, it falls back to the default behavior by calling the superclass method.
When calling dumps()
, you pass the UserEncoder
class as the cls
argument. This tells the json module to use your custom encoder class.
Conclusion
In this tutorial, you learned how to use Python‘s json module to parse, encode, and manipulate JSON data. You saw how to:
- Parse JSON strings into Python objects with
loads()
- Load JSON data from files with
load()
- Encode Python objects into JSON strings with
dumps()
- Write JSON data to files with
dump()
- Handle custom object encoding by subclassing
JSONEncoder
These skills form a strong foundation for working with JSON data in your Python programs. Whether you‘re interacting with web APIs, saving configuration files, or exchanging data between systems, the json module has you covered.
It‘s worth noting that the json module isn‘t the only way to serialize data in Python. The pickle
and marshal
modules offer similar functionality for Python-specific data serialization. However, JSON has the advantage of being a universal standard that can be read by most programming languages.
As you work more with JSON and Python, you may encounter more advanced use cases like customizing the decoding process, pretty-printing JSON output, or working with JSON Web Tokens (JWTs). The skills you learned in this tutorial will serve as a solid basis for tackling those more advanced topics.
Remember, practice is key to mastering any new skill. Try incorporating JSON into your own projects, whether it‘s saving user preferences, caching API responses, or communicating between microservices. The more you work with JSON, the more natural it will feel.
Happy coding!