Use data class for data storage of Python 3.7 or higher

Introduction

Are you using a dictionary or ordinary class to store data in Python? Starting with Python 3.7, there is a dataclass decorator that is useful for storing data.

In this article, I will explain how to use it, touching on when it is convenient and why it should be used, which cannot be grasped by the explanation of Official Document and PEP557.

In previous versions, only Python 3.6 can be used by pip install data classes. At the time of writing, the environment of Google Colaboratory is Python 3.6.9, but data classes are installed by default.

Assumed reader

--People who know the existence of dataclass but don't know what it is --People who want to handle data with high readability ――People who think "I didn't have this function before, and I don't have to use it separately ..."

The minimum explanation you often see

↓ This is

class Person:
    def __init__(self, number, name='XXX'):
        self.number = number
        self.name = name

person1 = Person(0, 'Alice')
print(person1.number) # 0
print(person1.name) # Alice

↓ You can write like this. (The class name is explicitly changed for distinction)

import dataclasses
@dataclasses.dataclass
class DataclassPerson:
    number: int
    name: str = 'XXX'
        
dataclass_person1 = DataclassPerson(0, 'Alice')
print(dataclass_person1.number) # 0
print(dataclass_person1.name) # Alice

You can use it by adding the decorator @ dataclasses.dataclass and writing the variable name you want to define instead of__ init__ ()with type annotations.

`init ()` is created automatically, and type annotation is required.

What has changed is that you no longer have to bother to assign ** arguments to instance variables with __init __ (). ** It means that __init __ () is created automatically. ** It's not a hassle when there are a lot of variables, and I'm happy that it's refreshing. ** Also, other special methods such as __eq__ () and __repr__ () are created automatically, as described below.

And since type annotation is mandatory, I'm happy to know the type. (However, this is where you want to set def __init __ (self, number: int, name: str ='XXX') even in a normal class)

** It can be clearly stated that this class exists to store data **, which is also an important factor in terms of readability.

I want to avoid dictionaries

If you just want to do the above example, you can use a dictionary. Why bother to use a class, let alone a dataclass decorator? It seems that there are many people who use a dictionary for input and output for the time being.

dict_person1 = {'number': 0, 'name': 'Alice'}
print(dict_person1['number']) # 0
print(dict_person1['name']) # Alice

What are the disadvantages of dictionaries that are easy to understand?

Dot access is not possible. (However, it may be okay if you can't do it)
Methods such as storage processing cannot be included.
Type annotation is not possible.
It is difficult to grasp from the code that it has a fixed shape.

3 and 4 are important for aiming for code that is easy to read and maintain later, which is a reason to avoid dictionaries even if you don't need methods. However, these can also be covered in regular classes.

Benefits of data class

Let's take a deep dive into how a class with the dataclass decorator is better than a regular class.

Advantages: `eq ()` is automatically created and unittest is easy.

When comparing instances, in a normal class, instances with the same contents but different contents will be False. This is because we are comparing the values returned by id (), which is not very useful. ** Considering doing a unit test, I want it to be True when the elements match. ** **

↓ If you do nothing in a normal class, it will be like this.

class Person:
    def __init__(self, number, name='XXX'):
        self.number = number
        self.name = name

person1 = Person(0, 'Alice')

print(person1 == Person(0, 'Alice')) # False
print(person1 == Person(1, 'Bob')) # False

↓ In order to compare elements in a normal class, you will have to define __eq__ () yourself.

class Person:
    def __init__(self, number, name='XXX'):
        self.number = number
        self.name = name
        
    def __eq__(self, other):
        if not isinstance(other, Person):
            return NotImplemented
        return self.number == other.number and self.name == other.name

person1 = Person(0, 'Alice')

print(person1 == Person(0, 'Alice')) # True
print(person1 == Person(1, 'Bob')) # False

↓ If you use the dataclass decorator, this __eq__ () will be created automatically. It saves time and looks neat.

@dataclasses.dataclass
class DataclassPerson:
    number: int
    name: str = 'XXX'
        
dataclass_person1 = DataclassPerson(0, 'Alice')

print(dataclass_person1 == DataclassPerson(0, 'Alice')) # True
print(dataclass_person1 == DataclassPerson(1, 'Bob')) # False

Also, if @ dataclasses.dataclass (order = True) is set, __lt__ (), __le__ (), __gt__ (), and __ge__ () are also created for the operation of magnitude comparison. I will. These are specifications that first compare different elements, just like when comparing tuples. It's a little confusing, so you might want to define it yourself if you need it.

Advantage: You can use asdict to convert it into a dictionary even if it is nested.

Use dataclasses.asdict () when you want to convert to a dictionary, such as when you want to output as JSON. It doesn't matter if you nest the dataclass.

@dataclasses.dataclass
class DataclassScore:
    writing: int
    reading: int
    listening: int
    speaking: int
        
@dataclasses.dataclass
class DataclassPerson:
    score: DataclassScore
    number: int
    name: str = 'Alice'
        
dataclass_person1 = DataclassPerson(DataclassScore(25, 40, 30, 35), 0, 'Alice')
dict_person1 = dataclasses.asdict(dataclass_person1)
print(dict_person1) # {'score': {'writing': 25, 'reading': 40, 'listening': 30, 'speaking': 35}, 'number': 0, 'name': 'Alice'}

import json
print(json.dumps(dict_person1)) # '{"score": {"writing": 25, "reading": 40, "listening": 30, "speaking": 35}, "number": 0, "name": "Alice"}'

Even a normal class can be converted to a dictionary format by using __dict__, but it takes some effort when nested.

When returning from the dictionary to the class, use unpack and do as follows.

DataclassPerson(**dict_person1)

Benefits: Easy to immutable

You can easily make it immutable using the data class. By making immutable data that will not be rewritten, you can avoid the anxiety that it may have changed somewhere.

↓ It is mutable if nothing is specified,

@dataclasses.dataclass
class DataclassPerson:
    number: int
    name: str = 'XXX'
        
dataclass_person1 = DataclassPerson(0, 'Alice')
print(dataclass_person1.number) # 0
print(dataclass_person1.name) # Alice

dataclass_person1.number = 1
print(dataclass_person1.number) # 1

↓ If you set frozen = True in the decorator argument, it will be immutable. At this time, __hash__ () is automatically created, and you can also use hash () to get the hash value.

@dataclasses.dataclass(frozen=True)
class FrozenDataclassPerson:
    number: int
    name: str = 'Alice'
    
frozen_dataclass_person1 = FrozenDataclassPerson(number=0, name='Alice')
print(frozen_dataclass_person1.number) # 0
print(frozen_dataclass_person1.name) # Alice
print(hash(frozen_dataclass_person1)) # -4135290249524779415

frozen_dataclass_person1.number = 1 # FrozenInstanceError: cannot assign to field 'number'

What is different from the named tuple that can be immutable

There are also standard libraries such as the following for applications that you want to make immutable.

collections.namedtuple --typing.NamedTuple (from Python3.6.1)

By using these, you can create tuples (= immutable objects) that allow dot access.

from collections import namedtuple

CollectionsNamedTuplePerson = namedtuple('CollectionsNamedTuplePerson', ('number' , 'name'))

collections_namedtuple_person1 = CollectionsNamedTuplePerson(number=0, name='Alice')
print(collections_namedtuple_person1.number) # 0
print(collections_namedtuple_person1.name) # Alice
print(collections_namedtuple_person1 == (0, 'Alice')) # True

collections_namedtuple_person1.number = 1 # AttributeError: can't set attribute

↓ Furthermore, typing.NamedTuple can also type annotation.

from typing import NamedTuple

class NamedTuplePerson(NamedTuple):
    number: int
    name: str = 'XXX'

namedtuple_person1 = NamedTuplePerson(0, 'Alice')
print(namedtuple_person1.number) # 0
print(namedtuple_person1.name) # Alice
print(typing_namedtuple_person1 == (0, 'Alice')) # True

namedtuple_person1.number = 1 # AttributeError: can't set attribute

For more information Write beautiful python with namedtuple! (Translation) --Qiita is easy to understand.

dataclass and typing.NamedTuple are similar, but different in detail. As shown in the code above, it seems to be a disadvantage to be True when compared with tuples that have the same elements.

One of the more convenient features of typing.NamedTuple is that it is a tuple, so you can do unpacked assignments. Depending on the usage, it may be better to force it into a data class.

Various functions

Since `repr ()` is created, you can easily check the contents.

Since __repr__ () is created automatically, you can easily check the contents with print () etc.

@dataclasses.dataclass
class DataclassPerson:
    number: int
    name: str = 'XXX'
        
dataclass_person1 = DataclassPerson(0, 'Alice')
print(dataclass_person1) # DataclassPerson(number=0, name='Alice')

If you want to have the same display in a normal class, you need to write the following.

class Person:
    def __init__(self, number, name='XXX'):
        self.number = number
        self.name = name

    def __repr__(self):
        return f'{self.__class__.__name__}({", ".join([f"{key}={value}" for key, value in self.__dict__.items()])})' 
    
person1 = Person(0, 'Alice')
print(person1) # Person(number=0, name=Alice)

You can write the post-initialization process with `__post_init__ ()`

Use __post_init__ () when you are doing something other than assignment with the normal class __init __ (). This method will be called after the assignment. Also, use dataclasses.field (init = False) to create an instance variable that is not passed as an argument.

@dataclasses.dataclass
class DataclassPerson:
    number: int
    name: str = 'XXX'
    is_even: bool = dataclasses.field(init=False)
    
    def __post_init__(self):
        self.is_even = self.number%2 == 0
        
dataclass_person1 = DataclassPerson(0, 'Alice')
print(dataclass_person1.number) # 0
print(dataclass_person1.name) # Alice
print(dataclass_person1.is_even) # True

You can pass initialization arguments with `InitVar`

As in the example below, there may be values that you want to pass as arguments at initialization but don't want to be instance variables.

class Person:
    def __init__(self, number, name='XXX'):
        self.name = name
        self.is_even = number%2 == 0

person1 = Person(0, 'Alice')
print(person1.name) # Alice
print(person1.is_even) # True

In that case, use InitVar.

@dataclasses.dataclass
class DataclassPerson:
    number:  dataclasses.InitVar[int]
    name: str = 'XXX'
    is_even: bool = dataclasses.field(init=False)
    
    def __post_init__(self, number):
        self.is_even = number%2 == 0
        
dataclass_person1 = DataclassPerson(0, 'Alice')
print(dataclass_person1.name) # Alice
print(dataclass_person1.is_even) # True

at the end

Since it is an Advent calendar less than a year after joining the company, it tends to be good for individual development, but it was an introduction of the parts that I want to cherish for team development.

It's convenient to use, but it's easy to neglect to catch up on features that can be managed without using them, but there are reasons to add new features. The atmosphere of recent Python has changed considerably from a few years ago, with the introduction of type annotations. There may be likes and dislikes, but first of all, I can't think of anything I don't know, so I want to make sure I don't leave it behind!

References

dataclasses --- Data Classes — Python 3.9.1 Documentation PEP 557 -- Data Classes | Python.org

Notice

If you read this article and thought it was "interesting" or "learned", please leave a comment on Twitter, facebook, or Hatena Bookmark!

In addition, DeNA Official Twitter Account @DeNAxTech publishes not only blog articles but also presentation materials at various study sessions. Please follow us! Follow @DeNAxTech