This article is the 16th day article of Python Part 2 Advent Calendar 2020.
Type Hints was introduced in Python 3.5, and it is now commonplace to write type information in code even in Python, which was originally a dynamically typed language.
In this article, I'll introduce you to pydantic, a library that makes the most of this type information to help you write more robust Python code.
Since it is also used in the Python web framework FastAPI, which has been a hot topic recently, many people may know its existence.
Actually, I also learned about the existence of this pydantic when I first used the Fast API.
pydantic is a library that realizes the following functions.
There are many people who say that this is the only thing. I will explain using an example after this.
GitHub: samuelcolvin/pydantic: Data parsing and validation using Python type hints Official documentation: pydantic
Example
pydantic works well in user-defined classes that inherit from the base class pydantic.BaseModel.
First, consider a class definition that does not use pydantic.
Use dataclasses.dataclass.
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class NonPydanticUser:
name: str
age: int
Let's create one instance of this NonPydanticUser class.
In this example, the two fields name are of type str and age is of type int.
It holds the data type as defined in the class.
Ichiro = NonPydanticUser(name="Ichiro", age=19)
print(Ichiro)
#> NonPydanticUser(name='Ichiro', age=19)
print(type(Ichiro.name))
#> <class 'str'>
print(type(Ichiro.age))
#> <class 'int'>
Let's create another instance.
Samatoki = NonPydanticUser(name="Samatoki", age="25")
print(Samatoki)
#> NonPydanticUser(name='Samatoki', age='25')
print(type(Samatoki.name))
#> <class 'str'>
print(type(Samatoki.age))
#> <class 'str'>
In this example, name is of type str, but age is of type str.
No exceptions such as TypeError are thrown.
You can see again that the type information given by the type annotation works only at the time of coding.
Certainly, if you use mypy or Pylance etc., you can detect such type inconsistency at the time of coding, but if you want to throw an exception due to type inconsistency or invalid value at the time of code execution, check the input value by yourself. Must be done.
On the other hand, the class definition using pydantic is as follows.
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
At first glance, it's similar to using dataclasses.dataclass.
But there is a clear difference.
First, let's create an instance using normal field values.
Ramuda = User(name="Ramuda", age=24)
print(Ramuda)
#> name='Ramuda' age=24
print(type(Ramuda.name))
#> <class 'str'>
print(type(Ramuda.age))
#> <class 'int'>
You can't really tell the difference with this alone.
Next, give age a str type number such as " 23 " or " 45 ".
Jakurai = User(name="Jakurai", age="35")
#> name='Jakurai' age=35
print(type(Jakurai.name))
#> <class 'str'>
print(type(Jakurai.age))
#> <class 'int'>
** Jakurai.age is cast to int type. ** **
By the way, what happens if you give age a value that cannot be cast to a int type such as hoge or fuga?
Sasara = User(name="Sasara", age="Is it true?")
#> ValidationError: 1 validation error for User
#> age
#> value is not a valid integer (type=type_error.integer)
An exception called ValidationError was thrown.
I have detected an invalid value even though I have not implemented validation in particular.
When pydantic is used in this way, the described type information is applied not only when coding but also when executing code, and it throws an easy-to-understand exception for invalid values (described later), so it is a dynamically typed language. You can write type-strict code in Python!
I will give a basic explanation using the following code in the official Example.
from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel
class User(BaseModel):
id: int
name = 'John Doe'
signup_ts: Optional[datetime] = None
friends: List[int] = []
external_data = {
'id': '123',
'signup_ts': '2019-06-01 12:22',
'friends': [1, 2, '3'],
}
user = User(**external_data)
print(user.id)
#> 123
print(repr(user.signup_ts))
#> datetime.datetime(2019, 6, 1, 12, 22)
print(user.friends)
#> [1, 2, 3]
print(user.dict())
"""
{
'id': 123,
'signup_ts': datetime.datetime(2019, 6, 1, 12, 22),
'friends': [1, 2, 3],
'name': 'John Doe',
}
"""
Define your own class by inheriting the base class pydantic.BaseModel.
In this class definition, four fields are defined: id, name, signup_ts, and friends.
Each field has a different description. According to the document, it has the following meanings.
-- id ( int) ... If you declare only Type Hints, it will be a required field. If a value of type str, bytes, or float is given at the time of instantiation, it is forcibly converted to int. If a value of any other data type (dict, list, etc.) is given, an exception will be thrown.
-- name ( str) ... From the default value John Doe, name is inferred to be of type str. Also, since the default value is declared, name is not a required field.
--signup_ts: (datetime, optional) ... datetime type where None is allowed. Also, since the default value is declared, sign_up is not a required field. You can give a int type UNIX timestamp (e.g. 1608076800.0) or a str type string representing a date and time as an argument.
--friends: (List [int]) ... Uses Python's built-in typing system. Also, since the default value is declared, it is not a required field. Like id,"123"and"45"are converted to int type.
I mentioned that if you try to give an invalid value when instantiating a class that inherits pydantic.BaseModel, you will throw an exception called pydantic.ValidationError.
Let's take a look inside the ValidationError using the code below.
from pydantic import ValidationError
try:
User(signup_ts='broken', friends=[1, 2, 'not number'])
except ValidationError as e:
print(e.json())
The contents of ValidationError for this code are as follows.
You can see what kind of inconsistency is occurring in each field.
[
{
"loc": [
"id"
],
"msg": "field required",
"type": "value_error.missing"
},
{
"loc": [
"signup_ts"
],
"msg": "invalid datetime format",
"type": "value_error.datetime"
},
{
"loc": [
"friends",
2
],
"msg": "value is not a valid integer",
"type": "type_error.integer"
}
]
Tips
This article alone cannot introduce all of pydantic, but from now on, I would like to introduce some elements that can be used immediately.
There are a wide variety of data types that support pydantic.
Here are some of them.
Standard Library Types
Of course, you can use primitive data types such as int, str, list, and dict.
It also supports built-in libraries such as typing, ipaddress, enum, decimal, pathlib, and uuid.
The following is an example using ipadress.IPv4Address.
from pydantic import BaseModel
from ipaddress import IPv4Address
class IPNode(BaseModel):
address: IPv4Address
client = IPNode(address="192.168.0.12")
srv = IPNode(address="hoge")
#> ValidationError: 1 validation error for IPNode
#> address
#> value is not a valid IPv4 address (type=value_error.ipv4address)
pydantic also supports URLs such as https://example.com and ftp: // hogehoge.
from pydantic import BaseModel, HttpUrl, AnyUrl
class Backend(BaseModel):
url: HttpUrl
bd1 = Backend(url="https://example.com")
bd2 = Backend(url="file://hogehoge")
#> ValidationError: 1 validation error for Backend
#> url
#> URL scheme not permitted (type=value_error.url.scheme; allowed_schemes={'https', 'http'})
You can also handle information that you do not want to output in the output such as logs.
For example, you can use pydantic.SecretStr for passwords.
from pydantic import BaseModel, SecretStr
class Password(BaseModel):
value: SecretStr
p1 = Password(value="hogehogehoge")
print(p1.value)
#> **********
EmailStr
It is a type that can handle email addresses.
However, to use it, you need to install a library called email-vaidator separately from pydantic.
Let's use this Email Str and the Secret Types in the previous section.
from pydantic import BaseModel, EmailStr, SecretStr, Field
class User(BaseModel):
email: EmailStr
password: SecretStr = Field(min_length=8, max_length=16)
# OK
Juto = User(email="[email protected]", password="hogehogehoge")
print(Juto)
#> email='[email protected]' password=SecretStr('**********')
# NG,email is not in the email address format
Rio = User(email="rio", password="hogehogehogehoge")
#> ValidationError: 1 validation error for User
#> email
#> value is not a valid email address (type=value_error.email)
# NG,The number of characters in password exceeds 16 characters
Gentaro = User(email="[email protected]", password="hogehogehogehogehoge")
#> ValidationError: 1 validation error for User
#> password
#> ensure this value has at most 16 characters (type=value_error.any_str.max_length; limit_value=16)
# NG,password has less than 8 characters
Daisu = User(email="[email protected]", password="hoge")
#> ValidationError: 1 validation error for User
#> password
#> ensure this value has at least 8 characters (type=value_error.any_str.min_length; limit_value=8)
from pydantic import BaseModel, HttpUrl, AnyUrl, SecretStr, conint
#Try to allow only positive numbers
class PositiveNumber(BaseModel):
value: conint(gt=0)
# OK
n1 = PositiveNumber(value=334)
#NG,Negative number
n2 = PositiveNumber(value=-100)
#> ValidationError: 1 validation error for PositiveNumber
#> value
#> ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)
In the example at the beginning of the article, I was thankful for casting str type numbers such as"23"and"45"to int type and accepting them.
You can also declare more stringent fields that don't even allow this cast.
from pydantic import BaseModel, conint, StrictInt
#Cast not allowed int
class StrictNumber(BaseModel):
value: StrictInt
# OK
n1 = StrictNumber(value=4)
#Even if it is a str type that can be cast and become an int type, it is not an int type, so it is NG
n2 = StrictNumber(value="4")
#> ValidationError: 1 validation error for StrictNumber
#> value
#> value is not a valid integer (type=type_error.integer)
It can also be combined with the Constrained Types in the previous section.
from pydantic import BaseModel conint
#Allow only natural numbers
class NaturalNumber(BaseModel):
value: conint(strict=True, gt=0)
# OK
n1 = NaturalNumber(value=334)
# NG,Negative number
n2 = NaturalNumber(value=-45)
#> ValidationError: 1 validation error for NaturalNumber
#> value
#> ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)
#Even if it is a str type that can be cast and become an int type, it is not an int type, so it is NG
n3 = NaturalNumber(value="45")
#> ValidationError: 1 validation error for NaturalNumber
#> value
#> value is not a valid integer (type=type_error.integer)
#float type is also not allowed
n4 = NaturalNumber(value=123.4)
#> ValidationError: 1 validation error for NaturalNumber
#> value
#> value is not a valid integer (type=type_error.integer)
Simple validations can be written at the time of field declaration, but user-defined validations can be created using pydantic.validator.
Consider a simple example.
Define a validator that is allowed only when the name field contains a single-byte space.
from pydantic import BaseModel, validator
#Do not allow the case where the name does not contain a space
class User(BaseModel):
name: str
age: int
@validator("name")
def validate_name(cls, v):
if ' ' not in v:
raise ValueError("must contain a space")
return v
# OK
Jiro = User(name="Jiro Yamada", age=17)
# NG
Saburo = User(name="Saburo Yamada", age=14)
#> ValidationError: 1 validation error for User
#> name
#> must contain a space (type=value_error)
For example, consider a Event class that holds the start and end times of an appointment as begin and end, respectively.
from datetime import datetime
from pydantic import BaseModel
class Event(BaseModel):
begin: datetime
end: datetime
event = Event(begin="2020-12-16T09:00:00+09:00", end="2020-12-16T12:00:00+09:00")
At this time, I want to guarantee that the time assigned to the end field is later than the time assigned to the begin field.
If the times of begin and end match, it is also considered to be an invalid value.
I think there are several ways to do it. I would like to introduce two.
The first way is to use pydantic.root_validator instead of pydantic.validator.
from datetime import datetime
from pydantic import BaseModel, root_validator
class Event(BaseModel):
begin: datetime
end: datetime
@root_validator(pre=True)
def validate_event_schedule(cls, values):
_begin: datetime = values["begin"]
_end: datetime = values["end"]
if _begin >= _end:
raise ValueError("Invalid event.")
return values
# OK
event1 = Event(begin="2020-12-16T09:00:00+09:00", end="2020-12-16T12:00:00+09:00")
# NG
event2 = Event(begin="2020-12-16T12:00:00+09:00", end="2020-12-16T09:00:00+09:00")
#> ValidationError: 1 validation error for Event
#> __root__
#> Invalid event. (type=value_error)
# NG
event3 = Event(begin="2020-12-16T12:00:00+09:00", end="2020-12-16T12:00:00+09:00")
#> ValidationError: 1 validation error for Event
#> __root__
#> Invalid event. (type=value_error)
The other utilizes the validator specification.
I will introduce the code first.
from datetime import datetime
from pydantic import BaseModel, root_validator, validator
class Event(BaseModel):
begin: datetime
end: datetime
@validator("begin", pre=True)
def validate_begin(cls, v):
return v
@validator("end")
def validate_end(cls, v, values):
if values["begin"] >= v:
raise ValueError("Invalid schedule.")
return v
In this code we have defined two validators.
When instantiating this Event class, validate_begin with the argument pre = True is executed first. In validate_begin, the value specified in the argument begin at the time of instantiation is set in the begin field as it is.
Then validate_end is processed.
However, unlike validate_begin, validate_end has an argument called values as the third argument.
** As a specification of pydantic.validator, you can access the fields checked by the validator executed before a certain validator using the third argument values. ** **
This values can't be either _values or Values. Think of it as a kind of reserved word.
In other words, in the case of this code, the order of input value check of each field is as follows.
begin by validate_begin is executed.validate_end performs an input value check for end. At this time, you can refer to the begin field withvalues [" begin"]from within the scope of validate_end.We have introduced the above two methods. Please let me know if there is a better way.
List, Dict, Set, etc.Consider a RepeatedExams class that meets the following specifications.
List [int] type field scores that stores the scores of exactly 10 exams (int type).The code is as follows.
If you want to check the input value by a certain validator for each element of the field of type such as List, Dict, Set, set each_item = True to that validator. I will.
The code below sets each_item = True for a validator called validate_each_score.
from pydantic import BaseModel
from typing import List
class RepeatedExams(BaseModel):
scores: List[int]
#Verify that the number of test results is exactly 10
@validator("scores", pre=True)
def validate_num_of_exams(cls, v):
if len(v) != 10:
raise ValueError("The number of exams must be 10.")
return v
#Verify that the result of one test is 50 points or more
@validator("scores", each_item=True)
def validate_each_score(cls, v):
assert v >= 50, "Each score must be at least 50."
return v
#Verify that the total test results are 800 points or more
@validator("scores")
def validate_sum_score(cls, v):
if sum(v) < 800:
raise ValueError("sum of numbers greater than 800")
return v
# OK
result1 = RepeatedExams(scores=[87, 88, 77, 100, 61, 59, 97, 75, 80, 85])
# NG,I have only taken the test 9 times
result2 = RepeatedExams(scores=[87, 88, 77, 100, 61, 59, 97, 75, 80])
#> ValidationError: 1 validation error for RepeatedExams
#> scores
#> The number of exams must be 10. (type=value_error)
# NG,There are exams with less than 50 points
result3 = RepeatedExams(scores=[87, 88, 77, 100, 32, 59, 97, 75, 80, 85])
#> ValidationError: 1 validation error for RepeatedExams
#> scores -> 4
#> Each score must be at least 50. (type=assertion_error)
# NG,The total of 10 tests is less than 800 points
result4 = RepeatedExams(scores=[87, 88, 77, 100, 51, 59, 97, 75, 80, 85])
#> ValidationError: 1 validation error for RepeatedExams
#> scores
#> sum of numbers greater than 800 (type=value_error)
Instances of classes that inherit from pydantic.BaseModel can be converted to dictionary or JSON format, and can be copied.
Not only can you convert and copy, but you can also specify the target field and output only a specific field.
from pydantic import BaseModel, conint
class User(BaseModel):
name: str
age: conint(strict=True, ge=0)
height: conint(strict=True, ge=0)
weight: conint(strict=True, ge=0)
Kuko = User(name="Kuko", age=19, height=168, weight=58)
print(Kuko)
#Convert to dict for all fields
Kuko_dict_1 = Kuko.dict()
print(Kuko_dict_1)
#> {'name': 'Kuko', 'age': 19, 'height': 168, 'weight': 58}
#Convert to dict only for name
Kuko_name = Kuko.dict(include={"name"})
print(Kuko_name)
#> {'name': 'Kuko'}
#Copy for all fields
print(Kuko.copy())
print(Kuko_2)
#> name='Kuko' age=19 height=168 weight=58
#Copy excluding only age
Kuko_3 = Kuko.copy(exclude={"age"})
print(Kuko_3)
#> name='Kuko' height=168 weight=58
#JSON for all fields
Kuko_json = Kuko.json()
print(Kuko_json)
#> {"name": "Kuko", "age": 19, "height": 168, "weight": 58}
print(type(Kuko_json))
#> <class 'str'>
I abandoned other elements such as Model Config and Schema because I couldn't get enough time to write them. I wish I could add it in the future ...
Recommended Posts