What I'm careful about in Python coding: comments, type annotations, data classes, enums

Overview

I usually code in Python. Python often works even if I write it roughly, so I just write it down properly. However, when I review such code later or show it to others, I end up with something like "What is this, what are you doing ..." or "How do you use this function ...". So, as a memorandum, I will list the things that "If you pay attention to these things for the time being, it will be easier to understand to some extent."

comment

** Describe the purpose of processing instead of commenting on the processing content **

I try to follow the processing contents by function name or variable name, and I am conscious of commenting the processing purpose that is difficult to follow by itself in natural language ~~ I often forget ~~.

For example, the code is as follows.

Comment example that describes the processing content


# 1/Reduce to 4 ← Can be imagined from function names and variable names
small_image = resize(image, scale=1/4)

Example comment that describes the processing purpose


#Shrink to ensure stable memory when pouring into the model
small_image = resize(image, scale=1/4)

I don't think it's wrong to comment out what's going on. I am very happy to have comments on the processing content when the code spans multiple lines or the code is not understandable at first glance due to optimization.

However, as in the example above, I think it would be redundant to comment on the same content as the code when reading the code is not a hassle.

If you write a comment with the consciousness that "the motive and purpose of processing is difficult to follow from the code", I think that the code will be easy to remember even if you look back over time.

Type annotation

** There is no loss in doing just the function arguments and return values **

The type annotation in Python is comment + α, which does not guarantee the type matching at runtime, but I think that it should be annotated as much as possible.

A good place to type annotation

Annotation example


#Annotations for method arguments and return values are very good (you don't have to look inside the method)
class Person:
    def __init__(self, first_name: str, last_name: str, age: int):  #Annotations for method arguments
        self._name: str = first_name + ' ' + last_name  #Annotation for variables
        self._age = age

    def is_older_than(self, age: int) -> bool:  #Annotation for method return value
        return self._age > age

In particular, the arguments and return values of the published functions cannot be used unless they are type annotated, so I think they are essential. Of course, you can write it in docstring. A reasonably sophisticated editor should also parse docstring.

Type annotations used on a daily basis

Here's how to type-annotate variables with built-in types like ʻintandfloat`, and variables that instantiate classes.

Annotations of everyday types


age: int = 0
weight: float = 0.0
name: str = 'string'
is_student: bool = True

taro: Person = Person('taro', 'suzuki', 20)

Other built-in types that are often used are list, dict, and tuple. These are the same

python


friends: list = [daisuke, tomoko]
parents: tuple = (mother, father)
contacts: dict = {'email': 'xxx@mail', 'phone_number': 'XXXX'}

You can annotate like this, but you can use the typing module to annotate in more detail. For example, in the above example, you can see that friends is a list, but I don't know what kind of elements should be included. Here's how to annotate elements using typing.

Detailed annotation of everyday types using typing

Detailed annotation of everyday types using typing


from typing import List, Tuple, Dict  #Import for type annotation

friends: List[Person] = [daisuke, tomoko]  #List with Person instance as an element
parents: Tuple[Person, Person] = (mother, father)  #Tuple with a Person instance as two elements
contacts: Dict[str, str] = {'email': 'xxx@mail', 'phone_number': 'XXXX'}  #key is str,Dictionary where value is str

typing allows for more detailed type annotations. Personally, I am very relieved because I can understand what kind of dictionary type I expect, especially if there are annotations for key and value in the type annotation of Dict.

These can also be annotated in a nested structure. For example, some people may have multiple email addresses and phone numbers. Then you will want the value of contacts to beList [str]instead of str. At that time

python


#key is str,A dictionary where value is a list of str
contacts: Dict[str, List[str]] = 
    {
        'email': 
            ['xxx@mail', 'yyy@mail'], 
        'phone_number':
            ['XXXX', 'YYYY']
     }

It is possible to annotate as follows.

Convenient type annotation using typing

typing enables various annotations other than the annotations listed above. I will introduce ʻUnion and ʻOptional as frequently used ones.

Union [A, B, C]: A, B, C

Consider writing a function that changes the weight of a Person instance. It's a very simple implementation, but it changes the weight by the weight received.

python


class Person:
    ...
    def update_weight(self, weight: float) -> float
        self.weight += weight
        return self.weight

It looks good at first glance, but it feels a little sad to accept only float as a change. It may be a little convenient if you also receive ʻint. I want to make a type annotation that is OK if it is ʻint or float. ʻUnion` can be used when you say "any of these types is OK".

python


class Person:
    ...
    def update_weight(self, weight: Union[int, float]) -> float
        self.weight += weight
        return self.weight

This allows both ʻint and float to be included in the ʻupdate_weight argument.

Optional [A]: A or None

When you start using ʻUnion, you may find that you often use the notation ʻUnion [A, None] in your code.

For example, suppose you define a ʻOccupationclass that represents a profession. Let's give thePersonclass a profession for that person. However, maybePerson is a student and has no profession. I want to say that the profession is ʻOccupation or None. You can use ʻUnion` in such a case.

python


class Person:
    def __init__(..., occupation: Union[Occupation, None]):
        self._occupation = occupation

As another example, let's say you want to have a function that gets the person's passport ID as a string. But you may not have a passport. One way is to return an empty string. But if you want to make it clear that it doesn't exist, you might consider returning None.

python


class Person:
    def get_passport_id(self) -> Union[str, None]:
        if self.has_passport:
            return self._passport._id
        else:
            return None

The more frequent the opportunity to do these type annotations, the more annoying it becomes. For such a case, ʻOptional is prepared. ʻOptional is used like ʻOptional [A] and means ʻA or None. ʻOptional [A] = Union [A, None]`.

Using ʻOptional`, the previous example looks like this:

python


class Person:
    def __init__(..., occupation: Optional[Occupation]):
        self._occupation = occupation

    def get_passport_id(self) -> Optional[str]:
        if self.has_passport:
            return self._passport._id
        else:
            return None

I think it's a little easier to convey the intent of the code, probably because the word ʻOptional` was assigned.

Type annotations that may be useful using typing

I don't use it that much, but it might be useful

NewType: Creating a new type

You can define a new type. For example, as a function that searches for the corresponding Person instance from the person_id of type ʻint`

python


def find_by_id(person_id: int) -> Person:
    ...

I think you can write something like this. Here, the argument name is a descriptive name of person_id, so it may not be confused so much, but you may inadvertently make a mistake of passing something like ʻoccupation_id defined with the same ʻint type as an argument. not. To prevent such accidental mistakes, I dare to define the PersonId class.

python


class PersonId(int):
    pass


def find_by_id(person_id: PersonId) -> Person:
    ...

p = find_by_id(PersonId(10))

This may be fine, but it incurs the overhead of instantiation. With NewType

python


from typing import NewType


PersonId = NewType('PersonId', int)

def find_by_id(person_id: PersonId) -> Person:
    ...

p = find_by_id(PersonId(10))

However, this has a small overhead because it only calls a function that does nothing and returns 10 at runtime. Also, as the meaning of the code, it is ʻint`, but I think it is easy to see that human error is prevented by redefining this as a different type.

TypeAlias: Alias for complex types

For example, if it is about Dict [str, List [str]], you may read "Well, str is a key and str is a dictionary with a list of elements of value", but this is List [Dict]. [str, Union [int, float, None]]] It's hard to understand when it comes to , and it's a pain to add this much type annotation every time with a function that exchanges this type. In that case, if you use TypeAlias,

TypeAlias


TypeReportCard = List[Dict[str, Union[int, float, None]]]]

def set_report_card(report_card: TypeReportCard):
    ...

set_report_card([{'math': 9.5}, {'english': 7}, {'science': None}])
#Mistake example-> set_report_card(TypeReportCard([{'math': 9.5}, {'english': 7}, {'science': None}])) 

You can write clearly like. Just create an alias and no special import is required. Unlike NewType, it is just an alias, so when you actually use it as an argument, you do not need to wrap it in an alias name.

It may be interesting to read PEP 484

Data class

** When using the dictionary type, consider the data class once **

I wanted to have the contact information of a certain person as data, so I will express it in a dictionary type.

contacts_dictionary



contacts: Dict[str, Optional[str]] = 
    {
        'email': 'xxx@mail',
        'phone_number': None
     }

If you look at this, it's a very common code. One day, you want to know "whether a contact has a phone number".

python


def has_phone_number(contacts: Dict[str, Optional[str]]) -> bool:
    return contacts.get('phone_number', None) is not None
#    return 'phone_number' in contacts and contacts['phone_number']is not None is fine

I think it works without any problems. Two weeks later, I needed this function again. Thanks to the type annotation, we can remember what the contacts looks like and successfully call the function.

has_phone_number({'email': 'xxx@mail', 'phone-number': 'XXXX'})

However, the return value of this function will be False. If you look closely, you can see that phone_number has been changed to phone-number. As a result, phone_number does not exist and the result is False. The type annotation of Dict [str, Optional [str]] didn't know the name of the required key, so I couldn't remember the exact key name I decided two weeks ago.

This example may be easy to understand as the implementation of has_phone_number is written just above. But what if this function is implemented far away? What if you don't immediately notice that the result is False? I think debugging will be difficult.

It is standard to avoid directly embedded constants that appear in the code as much as possible, but you should also be careful about dictionary type keys.

In such a case, you may want to consider Data Class once.

Replacing dictionaries with data classes

Data class


#Substitute with dataclass
import dataclasses


@dataclasses.dataclass
class Contacts:
    email: Optional[str]
    phone_number: Optional[str]

dataclasses.dataclass will automatically generate __init __, so the notation as a class is sufficient above. In addition, it has various functions such as automatic generation of __eq__ and automatic generation of __repr__, but I will omit it this time.

If you define your contacts as a data class as above, you can implement has_phone_number as below.

c = Contacts(email='xxx@mail', phone_number='XXXX')

def has_phone_number(contacts: Contacts) -> bool:
    return contacts.phone_number is not None

Accessed as a data class field in this way, you won't make a typo (because the editor supports checking and suggestion).

Also, unlike when it was defined in Dict [str, Optional [str]], the key name (field name) is fixed, and each key is given a type, so what kind of data is it? It is more concrete whether you are requesting.

Note: Replacing the dictionary with NamedTuple

The data class is a feature from Python 3.7, so if you use 3.6 or less, you might consider NamedTuple in typing. NamedTuple is used like this.

NamedTuple


#Substitute with NamedTuple
from typing import NamedTuple


class Contacts(NamedTuple):
    email: Optional[str]
    phone_number: Optional[str]

c = Contacts(email='xxx@mail', phone_number='XXXX')

def has_phone_number(contacts: Contacts) -> bool:
    return contacts.phone_number is not None

The description is almost the same as data class, but while data class has the power to make more detailed settings, Named Tuple is just a named tuple.

Enum (enum)

The last is enum.

It's very useful when a variable can only take A, B, or C ... For example, I often skim this kind of code.

def display(object_shape: str):
    if object_shape == 'circle':
        ...
    elif object_shape == 'rectangle':
        ...
    elif object_shape == 'triangle':
        ...
    else:
        raise NotImplementedError

display('circle')

When dividing the process according to the state, it is written like this, and the state management becomes complicated later, or the state name is not known and the implementation is searched for. Also, if you look only at the type, it looks like anything is fine with str, but in reality, you will get an error except for some strings.

In such a case, you may be able to write clearly by using an enumeration type (ʻenum`).

Enum


from enum import Enum


class ObjectShape(Enum):
    CIRCLE = 0
    RECTANGLE = 1
    TRIANGLE = 2

def display(object_shape: ObjectShape):
    if object_shape is ObjectShape.CIRCLE:
        ...
    elif object_shape is ObjectShape.RECTANGLE:
        ...
    elif object_shape is ObjectShape.TRIANGLE:
        ...
    else:
        raise NotImplementedError

display(ObjectShape.CIRCLE)

Now you don't have to worry about typos, and you can see at a glance what is allowed by looking at the mold. Also, this time I manually assigned all the identifiers of ʻEnum`, but this number should have no particular meaning, so

auto


import enum


class ObjectShape(enum.Enum):
    CIRCLE = enum.auto()
    RECTANGLE = enum.auto()
    TRIANGLE = enum.auto()

I think it is best to have a unique value assigned automatically like this.

Recommended Posts

What I'm careful about in Python coding: comments, type annotations, data classes, enums
What was surprising about Python classes
Type annotations for Python2 in stub files!
Data analysis in Python: A note about line_profiler
About __all__ in python