Write Python code that applies type information at runtime using pydantic

This article is the 16th day article of Python Part 2 Advent Calendar 2020.

Type Hints was introduced in Python 3.5, and it is now commonplace to write type information in code even in Python, which was originally a dynamically typed language.

In this article, I'll introduce you to pydantic, a library that makes the most of this type information to help you write more robust Python code.

What is pydantic

Since it is also used in the Python web framework FastAPI, which has been a hot topic recently, many people may know its existence. Actually, I also learned about the existence of this pydantic when I first used the Fast API.

pydantic is a library that realizes the following functions.

There are many people who say that this is the only thing. I will explain using an example after this.

Official resources

GitHub: samuelcolvin/pydantic: Data parsing and validation using Python type hints Official documentation: pydantic

Example

pydantic works well in user-defined classes that inherit from the base class pydantic.BaseModel.

First, consider a class definition that does not use pydantic. Use dataclasses.dataclass.

from dataclasses import dataclass, field
from typing import Optional

@dataclass
class NonPydanticUser:
    name: str
    age: int

Let's create one instance of this NonPydanticUser class. In this example, the two fields name are of type str and age is of type int. It holds the data type as defined in the class.

Ichiro = NonPydanticUser(name="Ichiro", age=19)
print(Ichiro)
#> NonPydanticUser(name='Ichiro', age=19)
print(type(Ichiro.name))
#> <class 'str'>
print(type(Ichiro.age))
#> <class 'int'>

Let's create another instance.

Samatoki = NonPydanticUser(name="Samatoki", age="25")
print(Samatoki)
#> NonPydanticUser(name='Samatoki', age='25')
print(type(Samatoki.name))
#> <class 'str'>
print(type(Samatoki.age))
#> <class 'str'>

In this example, name is of type str, but age is of type str. No exceptions such as TypeError are thrown.

You can see again that the type information given by the type annotation works only at the time of coding. Certainly, if you use mypy or Pylance etc., you can detect such type inconsistency at the time of coding, but if you want to throw an exception due to type inconsistency or invalid value at the time of code execution, check the input value by yourself. Must be done.

On the other hand, the class definition using pydantic is as follows.

from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

At first glance, it's similar to using dataclasses.dataclass. But there is a clear difference.

First, let's create an instance using normal field values.

Ramuda = User(name="Ramuda", age=24)
print(Ramuda)
#> name='Ramuda' age=24
print(type(Ramuda.name))
#> <class 'str'>
print(type(Ramuda.age))
#> <class 'int'>

You can't really tell the difference with this alone. Next, give age a str type number such as " 23 " or " 45 ".

Jakurai = User(name="Jakurai", age="35")
#> name='Jakurai' age=35
print(type(Jakurai.name))
#> <class 'str'>
print(type(Jakurai.age))
#> <class 'int'>

** Jakurai.age is cast to int type. ** **

By the way, what happens if you give age a value that cannot be cast to a int type such as hoge or fuga?

Sasara = User(name="Sasara", age="Is it true?")
#> ValidationError: 1 validation error for User
#> age
#>   value is not a valid integer (type=type_error.integer)

An exception called ValidationError was thrown. I have detected an invalid value even though I have not implemented validation in particular.

When pydantic is used in this way, the described type information is applied not only when coding but also when executing code, and it throws an easy-to-understand exception for invalid values ​​(described later), so it is a dynamically typed language. You can write type-strict code in Python!

pydantic is recommended for people like this! !!

Basics of pydantic

I will give a basic explanation using the following code in the official Example.

from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel


class User(BaseModel):
    id: int
    name = 'John Doe'
    signup_ts: Optional[datetime] = None
    friends: List[int] = []


external_data = {
    'id': '123',
    'signup_ts': '2019-06-01 12:22',
    'friends': [1, 2, '3'],
}
user = User(**external_data)
print(user.id)
#> 123
print(repr(user.signup_ts))
#> datetime.datetime(2019, 6, 1, 12, 22)
print(user.friends)
#> [1, 2, 3]
print(user.dict())
"""
{
    'id': 123,
    'signup_ts': datetime.datetime(2019, 6, 1, 12, 22),
    'friends': [1, 2, 3],
    'name': 'John Doe',
}
"""

Define your own class by inheriting the base class pydantic.BaseModel. In this class definition, four fields are defined: id, name, signup_ts, and friends. Each field has a different description. According to the document, it has the following meanings.

-- id ( int) ... If you declare only Type Hints, it will be a required field. If a value of type str, bytes, or float is given at the time of instantiation, it is forcibly converted to int. If a value of any other data type (dict, list, etc.) is given, an exception will be thrown. -- name ( str) ... From the default value John Doe, name is inferred to be of type str. Also, since the default value is declared, name is not a required field. --signup_ts: (datetime, optional) ... datetime type where None is allowed. Also, since the default value is declared, sign_up is not a required field. You can give a int type UNIX timestamp (e.g. 1608076800.0) or a str type string representing a date and time as an argument. --friends: (List [int]) ... Uses Python's built-in typing system. Also, since the default value is declared, it is not a required field. Like id,"123"and"45"are converted to int type.

I mentioned that if you try to give an invalid value when instantiating a class that inherits pydantic.BaseModel, you will throw an exception called pydantic.ValidationError.

Let's take a look inside the ValidationError using the code below.

from pydantic import ValidationError

try:
    User(signup_ts='broken', friends=[1, 2, 'not number'])
except ValidationError as e:
    print(e.json())

The contents of ValidationError for this code are as follows. You can see what kind of inconsistency is occurring in each field.

[
  {
    "loc": [
      "id"
    ],
    "msg": "field required",
    "type": "value_error.missing"
  },
  {
    "loc": [
      "signup_ts"
    ],
    "msg": "invalid datetime format",
    "type": "value_error.datetime"
  },
  {
    "loc": [
      "friends",
      2
    ],
    "msg": "value is not a valid integer",
    "type": "type_error.integer"
  }
]

Tips

This article alone cannot introduce all of pydantic, but from now on, I would like to introduce some elements that can be used immediately.

Field Types

There are a wide variety of data types that support pydantic. Here are some of them.

Standard Library Types Of course, you can use primitive data types such as int, str, list, and dict. It also supports built-in libraries such as typing, ipaddress, enum, decimal, pathlib, and uuid.

The following is an example using ipadress.IPv4Address.

from pydantic import BaseModel
from ipaddress import IPv4Address

class IPNode(BaseModel):
    address: IPv4Address

client = IPNode(address="192.168.0.12") 

srv = IPNode(address="hoge")
#> ValidationError: 1 validation error for IPNode
#> address
#>  value is not a valid IPv4 address (type=value_error.ipv4address)

URLs

pydantic also supports URLs such as https://example.com and ftp: // hogehoge.

from pydantic import BaseModel, HttpUrl, AnyUrl

class Backend(BaseModel):
    url: HttpUrl

bd1 = Backend(url="https://example.com") 

bd2 = Backend(url="file://hogehoge")
#> ValidationError: 1 validation error for Backend
#> url
#>  URL scheme not permitted (type=value_error.url.scheme; allowed_schemes={'https', 'http'})

Secret Types

You can also handle information that you do not want to output in the output such as logs. For example, you can use pydantic.SecretStr for passwords.

from pydantic import BaseModel, SecretStr

class Password(BaseModel):
    value: SecretStr

p1 = Password(value="hogehogehoge")
print(p1.value)
#> **********

EmailStr

It is a type that can handle email addresses. However, to use it, you need to install a library called email-vaidator separately from pydantic.

Let's use this Email Str and the Secret Types in the previous section.

from pydantic import BaseModel, EmailStr, SecretStr, Field

class User(BaseModel):
    email: EmailStr
    password: SecretStr = Field(min_length=8, max_length=16)

# OK
Juto = User(email="[email protected]", password="hogehogehoge")
print(Juto)
#> email='[email protected]' password=SecretStr('**********')

# NG,email is not in the email address format
Rio = User(email="rio", password="hogehogehogehoge")
#> ValidationError: 1 validation error for User
#> email
#>   value is not a valid email address (type=value_error.email)

# NG,The number of characters in password exceeds 16 characters
Gentaro = User(email="[email protected]", password="hogehogehogehogehoge")
#> ValidationError: 1 validation error for User
#> password
#>   ensure this value has at most 16 characters (type=value_error.any_str.max_length; limit_value=16)

# NG,password has less than 8 characters
Daisu = User(email="[email protected]", password="hoge")
#> ValidationError: 1 validation error for User
#> password
#>   ensure this value has at least 8 characters (type=value_error.any_str.min_length; limit_value=8)

Constrained Types

from pydantic import BaseModel, HttpUrl, AnyUrl, SecretStr, conint

#Try to allow only positive numbers
class PositiveNumber(BaseModel):
    value: conint(gt=0)

# OK
n1 = PositiveNumber(value=334)

#NG,Negative number
n2 = PositiveNumber(value=-100)
#> ValidationError: 1 validation error for PositiveNumber
#> value
#>   ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)

Strict Types

In the example at the beginning of the article, I was thankful for casting str type numbers such as"23"and"45"to int type and accepting them. You can also declare more stringent fields that don't even allow this cast.

from pydantic import BaseModel, conint, StrictInt

#Cast not allowed int
class StrictNumber(BaseModel):
    value: StrictInt

# OK
n1 = StrictNumber(value=4)

#Even if it is a str type that can be cast and become an int type, it is not an int type, so it is NG
n2 = StrictNumber(value="4")
#> ValidationError: 1 validation error for StrictNumber
#> value
#>   value is not a valid integer (type=type_error.integer)

It can also be combined with the Constrained Types in the previous section.

from pydantic import BaseModel conint

#Allow only natural numbers
class NaturalNumber(BaseModel):
    value: conint(strict=True, gt=0)

# OK
n1 = NaturalNumber(value=334)

# NG,Negative number
n2 = NaturalNumber(value=-45)
#> ValidationError: 1 validation error for NaturalNumber
#> value
#>  ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)

#Even if it is a str type that can be cast and become an int type, it is not an int type, so it is NG
n3 = NaturalNumber(value="45")
#> ValidationError: 1 validation error for NaturalNumber
#> value
#>   value is not a valid integer (type=type_error.integer)

#float type is also not allowed
n4 = NaturalNumber(value=123.4)
#> ValidationError: 1 validation error for NaturalNumber
#> value
#>   value is not a valid integer (type=type_error.integer)

validators

Simple validations can be written at the time of field declaration, but user-defined validations can be created using pydantic.validator.

Basic validator

Consider a simple example. Define a validator that is allowed only when the name field contains a single-byte space.

from pydantic import BaseModel, validator

#Do not allow the case where the name does not contain a space
class User(BaseModel):
    name: str
    age: int

    @validator("name")
    def validate_name(cls, v):
        if ' ' not in v:
            raise ValueError("must contain a space")
        return v

# OK
Jiro = User(name="Jiro Yamada", age=17)

# NG
Saburo = User(name="Saburo Yamada", age=14)
#> ValidationError: 1 validation error for User
#> name
#>   must contain a space (type=value_error)

Implement a validator with multiple fields

For example, consider a Event class that holds the start and end times of an appointment as begin and end, respectively.

from datetime import datetime
from pydantic import BaseModel

class Event(BaseModel):
    begin: datetime
    end: datetime

event = Event(begin="2020-12-16T09:00:00+09:00", end="2020-12-16T12:00:00+09:00")

At this time, I want to guarantee that the time assigned to the end field is later than the time assigned to the begin field. If the times of begin and end match, it is also considered to be an invalid value.

I think there are several ways to do it. I would like to introduce two. The first way is to use pydantic.root_validator instead of pydantic.validator.

from datetime import datetime
from pydantic import BaseModel, root_validator

class Event(BaseModel):
    begin: datetime
    end: datetime

    @root_validator(pre=True)
    def validate_event_schedule(cls, values):
        _begin: datetime = values["begin"] 
        _end: datetime = values["end"]

        if _begin >= _end:
            raise ValueError("Invalid event.")
        return values

# OK
event1 = Event(begin="2020-12-16T09:00:00+09:00", end="2020-12-16T12:00:00+09:00")

# NG
event2 = Event(begin="2020-12-16T12:00:00+09:00", end="2020-12-16T09:00:00+09:00")
#> ValidationError: 1 validation error for Event
#> __root__
#>  Invalid event. (type=value_error)

# NG
event3 = Event(begin="2020-12-16T12:00:00+09:00", end="2020-12-16T12:00:00+09:00")
#> ValidationError: 1 validation error for Event
#> __root__
#>  Invalid event. (type=value_error)

The other utilizes the validator specification. I will introduce the code first.

from datetime import datetime
from pydantic import BaseModel, root_validator, validator

class Event(BaseModel):
    begin: datetime
    end: datetime

    @validator("begin", pre=True)
    def validate_begin(cls, v):
        return v

    @validator("end")
    def validate_end(cls, v, values):
        if values["begin"] >= v:
            raise ValueError("Invalid schedule.")
        return v

In this code we have defined two validators.

When instantiating this Event class, validate_begin with the argument pre = True is executed first. In validate_begin, the value specified in the argument begin at the time of instantiation is set in the begin field as it is.

Then validate_end is processed.

However, unlike validate_begin, validate_end has an argument called values as the third argument. ** As a specification of pydantic.validator, you can access the fields checked by the validator executed before a certain validator using the third argument values. ** ** This values can't be either _values or Values. Think of it as a kind of reserved word.

In other words, in the case of this code, the order of input value check of each field is as follows.

  1. First, the input value check of begin by validate_begin is executed.
  2. Then validate_end performs an input value check for end. At this time, you can refer to the begin field withvalues ​​[" begin"]from within the scope of validate_end.

We have introduced the above two methods. Please let me know if there is a better way.

Validator for each element contained in List, Dict, Set, etc.

Consider a RepeatedExams class that meets the following specifications.

The code is as follows. If you want to check the input value by a certain validator for each element of the field of type such as List, Dict, Set, set each_item = True to that validator. I will. The code below sets each_item = True for a validator called validate_each_score.

from pydantic import BaseModel
from typing import List

class RepeatedExams(BaseModel):
    scores: List[int]

    #Verify that the number of test results is exactly 10
    @validator("scores", pre=True)
    def validate_num_of_exams(cls, v):
        if len(v) != 10:
            raise ValueError("The number of exams must be 10.")
        return v

    #Verify that the result of one test is 50 points or more
    @validator("scores", each_item=True)
    def validate_each_score(cls, v):
        assert v >= 50, "Each score must be at least 50."
        return v

    #Verify that the total test results are 800 points or more
    @validator("scores")
    def validate_sum_score(cls, v):
        if sum(v) < 800:
            raise ValueError("sum of numbers greater than 800")
        return v
    
# OK
result1 = RepeatedExams(scores=[87, 88, 77, 100, 61, 59, 97, 75, 80, 85])

# NG,I have only taken the test 9 times
result2 = RepeatedExams(scores=[87, 88, 77, 100, 61, 59, 97, 75, 80])
#> ValidationError: 1 validation error for RepeatedExams
#> scores
#>   The number of exams must be 10. (type=value_error)

# NG,There are exams with less than 50 points
result3 = RepeatedExams(scores=[87, 88, 77, 100, 32, 59, 97, 75, 80, 85])
#> ValidationError: 1 validation error for RepeatedExams
#> scores -> 4
#>   Each score must be at least 50. (type=assertion_error)


# NG,The total of 10 tests is less than 800 points
result4 = RepeatedExams(scores=[87, 88, 77, 100, 51, 59, 97, 75, 80, 85])
#> ValidationError: 1 validation error for RepeatedExams
#> scores
#>   sum of numbers greater than 800 (type=value_error)

Exporting models

Instances of classes that inherit from pydantic.BaseModel can be converted to dictionary or JSON format, and can be copied. Not only can you convert and copy, but you can also specify the target field and output only a specific field.

from pydantic import BaseModel, conint

class User(BaseModel):
    name: str
    age: conint(strict=True, ge=0)
    height: conint(strict=True, ge=0)
    weight: conint(strict=True, ge=0)

Kuko = User(name="Kuko", age=19, height=168, weight=58)
print(Kuko)

#Convert to dict for all fields
Kuko_dict_1 = Kuko.dict()
print(Kuko_dict_1)
#> {'name': 'Kuko', 'age': 19, 'height': 168, 'weight': 58}

#Convert to dict only for name
Kuko_name = Kuko.dict(include={"name"})
print(Kuko_name)
#> {'name': 'Kuko'}

#Copy for all fields
print(Kuko.copy())
print(Kuko_2)
#> name='Kuko' age=19 height=168 weight=58 

#Copy excluding only age
Kuko_3 = Kuko.copy(exclude={"age"})
print(Kuko_3)
#> name='Kuko' height=168 weight=58

#JSON for all fields
Kuko_json = Kuko.json()
print(Kuko_json)
#> {"name": "Kuko", "age": 19, "height": 168, "weight": 58}
print(type(Kuko_json))
#> <class 'str'>

At the end

I abandoned other elements such as Model Config and Schema because I couldn't get enough time to write them. I wish I could add it in the future ...

Recommended Posts

Write Python code that applies type information at runtime using pydantic
Write test-driven FizzBuzz code using Python doctest.
Let's write python code that parses go code and generates go code
Write & compile & run code at codeanywhere.com
That Python code has no classes ...
Write selenium test code in python
Check python code styles using pep8
[VS Code] ~ Tips when using python ~
Extract zip code information using spark
Write Ethereum contract code using Serpent
Avoid run-time error ModuleNotFoundError for executables generated from Python code using Pyinstaller