[PYTHON] JSON encoding / decoding of custom objects

Overview

Describes an example of JSON-encoding / decoding a complex structure containing instances (objects) of a custom class.

The language is Python 3.8.1.

For Python, it's easier to use pickle to serialize complex data, but if you want to read or write outside of Python, or if you want some readability after serializing, choose JSON. Often.

I think there are other ways than the ones described here, but I like this one in the following ways:

  1. Use the Python standard json module.
  2. Encoding / decoding logic can be divided into classes.

Custom and complex data examples

Since it is used for explanation, it will not be so complicated, but I will try it with data that meets the following conditions.

  1. Multiple custom classes are included.
  2. The custom object's attributes also contain the custom object.
class Person:
    def __init__(self, name):
        self.name = name

class Robot:
    def __init__(self, name, creator=None):
        self.name = name
        self.creator = creator

alan = Person('Alan')
beetle = Robot('Beetle', creator=alan)
chappy = Robot('Chappy', creator=beetle)

alan is a human and beetle and chappy are robots. Below, I would like to make a list of robot data and encode / decode the list.

robots = [beetle, chappy]

Encode

Serializing an object into a JSON string is called ** encoding **. This list contains objects of the Person and Robot classes, so you need to be able to encode them.

Simple encoding

First, let's encode a simple Person class.

Determine the encoding specifications

To encode a custom object, you have to decide how to encode it (the spec).

Here, ** class name ** and ** attribute content ** are output as name-value pairs. In the case of alan above, it is assumed that the JSON string will be as follows.

{"class": "Person", "name": "Alan"}

Make a custom encoder

Use a custom encoder by specifying the cls parameter in the standard json.dumps function I can. Custom encoders are created by inheriting json.JSONEncoder and overriding the default method. .. Since the object is included in the argument of the default method, it is OK if you return it in a form that can be handled by json.JSONEncoder (here, dict including only str).

import json

class PersonEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Person):
            return {'class': 'Person', 'name': obj.name}
        else:
            return super().default(obj)

print(json.dumps(alan, cls=PersonEncoder))

#result:
{"class": "Person", "name": "Alan"}

Complex encoding

Next we'll create an encoder of the Robot class, but that's not complicated. As I wrote in the "Overview", ** the encoding logic is separated by class **.

class RobotEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Robot):
            return {'class': 'Robot', 'name': obj.name, 'creator': obj.creator}
        else:
            return super().default(obj)

It is almost the same as PersonEncoder. However, it doesn't go as it did with PersonEncoder earlier. This is because the creator in the return value is not in a form that can be handled by json.JSONEncoder. I dare to divide the logic in that way, and when actually encoding, use the two encoders together.

Combine encoders

To coalesce the encoders, create a new class with multiple inheritance.

class XEncoder(PersonEncoder, RobotEncoder):
    pass

print(json.dumps(robots, cls=XEncoder))

#result:
[{"class": "Robot", "name": "Beetle", "creator": {"class": "Person", "name": "Alan"}}, {"class": "Robot", "name": "Chappy", "creator": {"class": "Robot", "name": "Beetle", "creator": {"class": "Person", "name": "Alan"}}}]
When the result is formatted (click to display).
print(json.dumps(robots, cls=XEncoder, indent=4))

#result:
[
    {
        "class": "Robot",
        "name": "Beetle",
        "creator": {
            "class": "Person",
            "name": "Alan"
        }
    },
    {
        "class": "Robot",
        "name": "Chappy",
        "creator": {
            "class": "Robot",
            "name": "Beetle",
            "creator": {
                "class": "Person",
                "name": "Alan"
            }
        }
    }
]

This is because you can only specify one encoder class in the json.dumps function, but it can be extended even if the number of object types increases.

(Supplement) About the operation by multiple inheritance

I will briefly explain why it works by creating the above XEncoder.

In Multiple Inheritance of Python Classes, attributes are referenced in inheritance order. When you call the default method of XEncoder, you first go to the default method of the inherited PersonEncoder.

The PersonEncoder.default method will return dict by itself if obj is a Person object, otherwise it will call the supermethod.

The super method in this case would be RobotEncoder.default ** instead of ** json.JSONEncoder.default. This is Python's multiple inheritance movement.

If RobotEncoder.default calls a supermethod, it will not inherit any more, so it will be delegated to the original superclass json.JSONEncoder.

I haven't investigated how the default method is called recursively, but as long as the if statement makes a class decision, it seems that the same result can be obtained even if the inheritance order is reversed.

Decode

Deserializing a JSON string into an object, as opposed to encoding, is called ** decoding **. json.loads By passing the object_hook parameter to the method, custom processing is applied to the decoded object. Can be added.

Simple object_hook example

First, let's look at an example of encoding only an object of the Person class and decoding it. The function passed as object_hook receives a decoded object (such as dict), so write what to do if the value of'class'is dict which is'Person'.

def person_hook(obj):
    if type(obj) == dict and obj.get('class') == 'Person':
        return Person(obj['name'])
    else:
        return obj

#Encode to JSON string
alan_encoded = json.dumps(alan, cls=PersonEncoder)
#Decode from JSON string
alan_decoded = json.loads(alan_encoded, object_hook=person_hook)

print(alan_decoded.__class__.__name__, vars(alan_decoded))

#result:
Person {'name': 'Alan'}

Combine object_hook

Next, create a object_hook for the Robot class and create a new function that combines the two.

def robot_hook(obj):
    if type(obj) == dict and obj.get('class') == 'Robot':
        return Robot(obj['name'], creator=obj['creator'])
    else:
        return obj

def x_hook(obj):
    return person_hook(robot_hook(obj))

The combined function x_hook can also be written as: It will be a little longer, but it is easier to increase the number of hooks (the order of applying hooks is different from the above example, but there is no problem).

def x_hook(obj):
    hooks = [person_hook, robot_hook]
    for hook in hooks:
        obj = hook(obj)
    return obj

Let's use this to encode / decode the list of robots created above.

#Encode to JSON string
robots_encoded = json.dumps(robots, cls=XEncoder)
#Decode from JSON string
robots_decoded = json.loads(robots_encoded, object_hook=x_hook)

for robot in robots_decoded:
    print(robot.__class__.__name__, vars(robot))

#result:
Robot {'name': 'Beetle', 'creator': <__main__.Person object at 0x0000029A48D34CA0>}
Robot {'name': 'Chappy', 'creator': <__main__.Robot object at 0x0000029A48D38100>}

As with the encoding (probably because it is recursively decoded from the inside), changing the order in which the hooks are applied did not change the result.

(Supplement) Encoding can be customized in the same way

In fact, the encoding side can be customized by giving a function in the same way. On the contrary, if you try to make the decoding side a subclass of the decoder, it will be more complicated.

When combining custom encoding logic, it is better to choose the method of creating a subclass if you want to write only with multiple inheritance, and the method of giving a function if you want to match the style of the decoding side.

An example of customizing the encoding side by giving a function

def person_default(obj):
    if isinstance(obj, Person):
        return {'class': 'Person', 'name': obj.name}
    else:
        return obj

def robot_default(obj):
    if isinstance(obj, Robot):
        return {'class': 'Robot', 'name': obj.name, 'creator': obj.creator}
    else:
        return obj

def x_default(obj):
    defaults = [person_default, robot_default]
    for default in defaults:
        obj = default(obj)
    return obj

print(json.dumps(robots, default=x_default))

#result:
[{"class": "Robot", "name": "Beetle", "creator": {"class": "Person", "name": "Alan"}}, {"class": "Robot", "name": "Chappy", "creator": {"class": "Robot", "name": "Beetle", "creator": {"class": "Person", "name": "Alan"}}}]

Task

There are some issues with decoding. In the example above, the first decoded robot'Beetle'and the'Chappy''creator'Beetle' were originally the same object. Also, the creator` of those'Beelte',' Alan', was the same object.

The above decoding method does not completely reproduce the situation before encoding because it does not "reuse the objects already created because they have the same name". If you want to do that, you can create a mechanism for the Person and Robot classes so that you can receive the appropriate object from object_hook just by specifying the name.

Recommended Posts

JSON encoding / decoding of custom objects
JSON encoding and decoding with python
Equivalence of objects in Python