TL;DR
data class
?dataclass is a new standard library added in python 3.7. Simply put, if you add a @ dataclass
decorator to a class, you can read it as __init __
, __repr__
, __eq__
, __hash__
, so-called dunder (abbreviation for double underscore. ) A library that generates methods. It can be used to significantly reduce tedious class definitions and is faster than poor implementations. Since dataclass has various functions other than those introduced here, please refer to Official Document and [Python 3.7]. "Data Classes" may become the standard for class definitions](https://qiita.com/tag1216/items/13b032348c893667862a).
For those who still can't use python3.7, PyPI has a backport for 3.6.
data class
from dataclasses import dataclass, field
from typing import ClassVar, List, Dict, Tuple
import copy
@dataclass
class Foo:
i: int
s: str
f: float
t: Tuple[int, str, float, bool]
d: Dict[int, str]
b: bool = False #Default value
l: List[str] = field(default_factory=list) #default for list[]To
c: ClassVar[int] = 10 #Class variables
#Generated`__init__`Instantiate with
f = Foo(i=10, s='hoge', f=100.0, b=True,
l=['a', 'b', 'c'], d={'a': 10, 'b': 20},
t=(10, 'hoge', 100.0, False))
#Generated`__repr__`Print out the string representation of h with
print(f)
#Make a copy and rewrite
ff = copy.deepcopy(f)
ff.l.append('d')
#Generated`__eq__`Compare with
assert f != ff
I measured the execution time of DataclassFoo created using dataclass and ManualFoo written by hand, __init__
, __repr__
, __eq__
.
Source code used for measurement code> b> summary>
import timeit
from dataclasses import dataclass
@dataclass
class DataclassFoo:
i: int
s: str
f: float
b: bool
class ManualFoo:
def __init__(self, i, s, f, b):
self.i = i
self.s = s
self.f = f
self.b = b
def __repr__(self):
return f'ManualFoo(i={self.i}, s={self.s}, f={self.f}, b={self.b})'
def __eq__(self, b):
a = self
return a.i == b.i and a.s == b.s and a.f == b.f and a.b == b.b
def bench(name, f):
times = timeit.repeat(f, number=100000, repeat=5)
print(name + ':\t' + f'{sum(t)/5:.5f}')
bench('dataclass __init__', lambda: DataclassFoo(10, 'foo', 100.0, True))
bench('manual class __init__', lambda: ManualFoo(10, 'foo', 100.0, True))
df = DataclassFoo(10, 'foo', 100.0, True)
mf = ManualFoo(10, 'foo', 100.0, True)
bench('dataclass __repr__', lambda: str(df))
bench('manual class __repr__', lambda: str(mf))
df2 = DataclassFoo(10, 'foo', 100.0, True)
mf2 = ManualFoo(10, 'foo', 100.0, True)
bench('dataclass __eq__', lambda: df == df2)
bench('manual class __eq__', lambda: mf == mf2)
Average of running 5 sets of 100,000 times each
Measurement result(sec) | |
---|---|
dataclass __init__ | 0.04382 |
Handwritten class__init__ | 0.04003 |
dataclass __repr__ | 0.07527 |
Handwritten class__repr__ | 0.08414 |
dataclass __eq__ | 0.04755 |
Handwritten class__eq__ | 0.04593 |
It can be said that there is almost no difference if it is executed 500,000 times.
The bytecodes also matched.
>>> import dis
>>> dis.dis(DataclassFoo.__init__)
2 0 LOAD_FAST 1 (i)
2 LOAD_FAST 0 (self)
4 STORE_ATTR 0 (i)
3 6 LOAD_FAST 2 (s)
8 LOAD_FAST 0 (self)
10 STORE_ATTR 1 (s)
4 12 LOAD_FAST 3 (f)
14 LOAD_FAST 0 (self)
16 STORE_ATTR 2 (f)
5 18 LOAD_FAST 4 (b)
20 LOAD_FAST 0 (self)
22 STORE_ATTR 3 (b)
24 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>> dis.dis(ManualFoo.__init__)
13 0 LOAD_FAST 1 (i)
2 LOAD_FAST 0 (self)
4 STORE_ATTR 0 (i)
14 6 LOAD_FAST 2 (s)
8 LOAD_FAST 0 (self)
10 STORE_ATTR 1 (s)
15 12 LOAD_FAST 3 (f)
14 LOAD_FAST 0 (self)
16 STORE_ATTR 2 (f)
16 18 LOAD_FAST 4 (b)
20 LOAD_FAST 0 (self)
22 STORE_ATTR 3 (b)
24 LOAD_CONST 0 (None)
26 RETURN_VALUE
I would like to explain the important parts when explaining the data class.
PEP526: Syntax for Variable Annotations
PEP526 describes the method of type declaration, but the type information of the variable declared in class by this specification addition is described. It is now possible to get it when the program is executed.
from typing import Dict
class Player:
players: Dict[str, Player]
__points: int
print(Player.__annotations__)
# {'players': typing.Dict[str, __main__.Player],
# '_Player__points': <class 'int'>}
I think many people know eval. Roughly speaking, the difference from eval is
ʻEval: Evaluate the argument string as an expression ʻExec
: Evaluate the argument string as a statement
This alone doesn't make sense, so let's look at the next example.
It's easy to imagine that doing this will output "typing rocks!".
>>> exec('print("typing rocks!")')
"typing rocks!"
Then what is this?
exec('''
def func():
print("typing rocks!")
''')
Then try this
>>> func()
"typing rocks!"
so. In fact, exec evaluates strings as expressions, so even python functions can be defined dynamically. Great.
When a class with a dataclass decorator is imported, code is generated using the type annotations and exec described above. It's super rough, but the flow is as follows. For more information, read this part of the cpython source.
__init__
function definition ** string ** using type information__init__
function in the classThe code that simplifies 3, 4, and 5 looks like this.
nl = '\n' # f-Since escaping cannot be used inside string, define it outside
#Function definition string creation
s = f"""
def func(self, {', '.join([f.name for f in fields(Hoge)])}):
{nl.join(' self.'+f.name+'='+f.name for f in fields(Hoge))}
"""
#Try to output the function definition string to the console
print(s)
# def func(self, i, s, f, t, d, b, l):
# self.i=i
# self.s=s
# self.f=f
# self.t=t
# self.d=d
# self.b=b
# self.l=l
#Code generation with exec.`func`Function defined in scope
exec(s)
setattr(Foo, 'func', func) #Set the function generated in the class in the class
The above is a simplified example, but in reality
The function definition character string is created and the code is generated carefully so that it will operate correctly in any case.
Another thing to keep in mind is that this ** code generation only occurs the moment the module is loaded **. Once the class is imported, it can be used ** just like a handwritten class **.
# [derive]
Rust has a Derive attribute (# [derive]
) that is added when defining a struct. This can be about the same as or better than the data class. For example, if you look at the following,
#[derive(Debug, Clone, Eq, PartialEq, Hash)]
struct Foo {
i: i32,
s: String,
b: bool,
}
Just add # [derive (Debug, Clone, Eq, PartialEq, Hash)]
and it will generate this many methods.
__repr__
in Python)__eq__
and __gt__
in Python)__hash__
in Python)Rust is even better, with the ability to implement your own Custom derive officially supported, making it relatively casual. Allows type-based metaprogramming.
There are many other features in Rust that make these programmers easier, and I think that's why Rust is so productive, even with difficult type constraints and ownership. Rust is a really great language, so I encourage Pythonistas to try it out.
I personally think that the dataclass is a good example of the usefulness and potential of type-based metaprogramming.
I also made about two libraries based on dataclass, so if you are interested, please take a look.
A library that maps environment variable values to dataclass fields. Useful when you want to override Python's config class with environment variables using a container
A dataclass-based serialization library. Under development to implement the same functionality as Rust's God Library serde using dataclass.
As with Rust, I hope Python will get excited about this area and come up with a lot of good libraries.
Recommended Posts