[PYTHON] Introducing pyserde, a serialization framework that uses dataclass

We are developing a serialization framework that uses a data class called pyserde. How to read is Paiselde.

TL;DR

Add @serialize, @ deserialize decorators to a normally defined dataclass

@deserialize
@serialize
@dataclass
class Foo:
    i: int
    s: str
    f: float
    b: bool

Then, serialize to JSON with to_json.

>> h = Foo(i=10, s='foo', f=100.0, b=True)
>> print(f"Into Json: {to_json(h)}")
Into Json: {"i": 10, "s": "foo", "f": 100.0, "b": true}

You can serialize from JSON to object with from_json.

>> s = '{"i": 10, "s": "foo", "f": 100.0, "b": true}'
>> print(f"From Json: {from_json(Foo, s)}")
From Json: Foo(i=10, s='foo', f=100.0, b=True)

In addition to JSON, it supports MsgPack, YAML, and Toml. There are various other functions.

Motivation to start making

I read Implementation because I thought the dataclass added to Python 3.7 would be useful, and found the @ dataclass decorator. When the attached class is loaded into the module, use the exec function to [generate the method](https://github.com/python /cpython/blob/550f30c8f33a2ba844db2ce3da8a897b3e882c9a/Lib/dataclasses.py#L377-L401) I found. Ah, I thought this was interesting, and when I measured the performance of \ _ \ _ init \ _ \ _, \ _ \ _ repr \ _ \ _, and \ _ \ _ eq \ _ \ _, the result was almost the same as the handwritten class. , I thought that it would be possible to generate more methods using this.

Example of defining a function at runtime with exec:

#Define a function with a string
s = '''
def func():
    print("Hello world!")
'''

#Pass a string to exec
exec(s)

#The function is defined at runtime!
func()

For the mechanism and performance of dataclass, refer to Explanation here.

Origin of the name

Rust has a serialization framework called serde. This serde is a god anyway, and I personally think it's about 20% of the reason Rust is great.

I wanted to create a convenient, high performance and flexible framework like serde in Python, so I named it pyserde.

Getting started

Installation

Install with pip

pip install pyserde

dataclasses was added to Python 3.7, but 3.6 now uses the dataclasses backport on the PyPI.

Class definition

Let's make a class like this. It's just a dataclass, but with the @serialize and @deserialize decorators provided by pyserde.

from serde import serialize, deserialize
from dataclasses import dataclass

@deserialize
@serialize
@dataclass
class Foo:
    i: int
    s: str
    f: float
    b: bool

With @serialize, pyserde will generate a serialization method, and with @ deserialize, a deserialization method will be generated. Method generation is only called once when the class is loaded into the Python interpreter (pyserde or decorator behavior), so there is no overhead when actually using the class.

Try serializing and deserializing

Now, let's actually serialize and deserialize. pyserde supports JSON, Yaml, Toml and MsgPack as of 0.1.1. Helper functions for each format are in the serde. <Format name> module and have a naming convention.

For example, in the case of JSON, it becomes like this.

from serde.json import from_json, to_json

Call to_json to serialize the Foo object to JSON

f = Foo(i=10, s='foo', f=100.0, b=True)
print(to_json(f))

When serializing with, just specify the JSON string in the class Foo second argument in the first argument of from_json.

s = '{"i": 10, "s": "foo", "f": 100.0, "b": true}'
print(from_json(Foo, s))

In the case of Yaml, Toml, MsgPack, it looks like this.

Yaml
from serde.yaml import from_yaml, to_yaml
print(to_yaml(f))
print(from_yaml(Foo, s))
Toml
from serde.toml import from_toml, to_toml
print(to_toml(f))
print(from_toml(Foo, s))
MsgPack
from serde.msgpack import from_msgpack, to_msgpack
print(to_msgpack(f))
print(from_msgpack(Foo, s))

performance

The execution time was measured under the following conditions and compared with other serialization libraries. If you want to see the code for measurement, see here

Serialize / Deserialize to JSON 10,000 times each

Serialize Deserialize

Conversion to Tuple / Dict 10,000 times each

astuple asdict

In the chart, the horizontal axis is the comparison target and the vertical axis is Latency. The lower this bar graph, the better the performance. As you can see from the chart, pyserde's performance is second only to handwritten raw. It seems that the difference in performance is that the code of raw has fewer function calls than pyserde.

The comparison targets are listed below.

I benchmarked several other libraries, but I didn't put them on the chart because they were incomparably slow.

Other useful functions

It's a complete imitation of the original serde, but it implements the following useful functions.

Case Conversion

Convert snake_case to camelCase, kebab-case, etc.

@serialize(rename_all = 'camelcase')
@dataclass
class Foo:
    int_field: int
    str_field: str

f = Foo(int_field=10, str_field='foo')
print(to_json(f))

snake_case is now camelCase.

'{"intField": 10, "strField": "foo"}'

Rename Field

This is useful when you want to use a keyword such as class in the field name.

@serialize
@dataclass
class Foo:
    class_name: str = field(metadata={'serde_rename': 'class'})

print(to_json(Foo(class_name='Foo')))

The field name of class is class_name, but JSON is now class.

{"class": "Foo"}

Skip

You can exclude it from serialization / deserialization by adding serde_skip to the field.

@serialize
@dataclass
class Resource:
    name: str
    hash: str
    metadata: Dict[str, str] = field(default_factory=dict, metadata={'serde_skip': True})

resources = [
    Resource("Stack Overflow", "hash1"),
    Resource("GitHub", "hash2", metadata={"headquarters": "San Francisco"}) ]
print(to_json(resources))

The metadata field has been excluded.

[{"name": "Stack Overflow", "hash": "hash1"}, {"name": "GitHub", "hash": "hash2"}]

Conditional Skip

If you want to exclude by the specified condition, you can pass a conditional expression to serde_skip_if.

@serialize
@dataclass
class World:
    player: str
    buddy: str = field(default='', metadata={'serde_skip_if': lambda v: v == 'Pikachu'})

world = World('satoshi', 'Pikachu')
print(to_json(world))

world = World('green', 'Charmander')
print(to_json(world))

The buddy field will now be excluded only when it is" Pikachu ".

{"player": "satoshi"}
{"player": "green", "buddy": "Charmander"}

Application example

Read configuration file

I think you often use Yaml or Toml for your application's configuration files. With pyserde you can easily map from your config file to your class.

from dataclasses import dataclass
from serde import deserialize
from serde.yaml import from_yaml


@deserialize
@dataclass
class App:
    addr: str
    port: int
    secret: str
    workers: int


def main():
    with open('app.yml') as f:
        yml = f.read()
    cfg = from_yaml(App, yml)
    print(cfg)

JSON WebAPI

JSON WebAPI is fairly easy to implement with Flask, but pyserde makes it easy to map to your own types.

Pipfile
[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true
name = "pypi"

[packages]
pyserde = "~=0.1"
flask = "~=1.1"
app.py
from dataclasses import dataclass
from flask import Flask, request, Response
from serde import serialize, deserialize
from serde.json import to_json, from_json

@deserialize
@serialize
@dataclass
class ToDo:
    id: int
    title: str
    description: str
    done: bool

app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

@app.route('/todos', methods=['GET', 'POST'])
def todos():
    print(request.method)
    if request.method == 'GET':
        body = to_json([ToDo(1, 'Play games', 'Play Holy Sword Legend 3', False)])
        return Response(body, mimetype='application/json')
    else:
        todo = from_json(ToDo, request.get_data())
        return f'A new ToDo {todo} successfully created.'

if __name__ == '__main__':
    app.run(debug=True)

pipenv install
pipenv run python app.py
$ curl http://localhost:5000/todos
[{"id": 1, "title": "Play games", "description": "Play Holy Sword Legend 3", "done": false}]⏎
$ curl -X POST http://localhost:5000/todos -d '{"id": 1, "title": "Play games", "description": "Play Holy Sword Legend 3", "done": false}'
A new ToDo ToDo(id=1, title='Play games', description='Play Holy Sword Legend 3', done=False) successfully created.⏎

RPC

Unfortunately I can't show you the code, but my company has its own RPC framework and uses pyserde to serialize it into a message MsgPack.

References

Recommended Posts

Introducing pyserde, a serialization framework that uses dataclass
Introducing JustPy, a high-level web framework that does not require front-end programming