I usually code in Python. Python often works even if I write it roughly, so I just write it down properly. However, when I review such code later or show it to others, I end up with something like "What is this, what are you doing ..." or "How do you use this function ...". So, as a memorandum, I will list the things that "If you pay attention to these things for the time being, it will be easier to understand to some extent."
** Describe the purpose of processing instead of commenting on the processing content **
I try to follow the processing contents by function name or variable name, and I am conscious of commenting the processing purpose that is difficult to follow by itself in natural language ~~ I often forget ~~.
For example, the code is as follows.
Comment example that describes the processing content
# 1/Reduce to 4 ← Can be imagined from function names and variable names
small_image = resize(image, scale=1/4)
Example comment that describes the processing purpose
#Shrink to ensure stable memory when pouring into the model
small_image = resize(image, scale=1/4)
I don't think it's wrong to comment out what's going on. I am very happy to have comments on the processing content when the code spans multiple lines or the code is not understandable at first glance due to optimization.
However, as in the example above, I think it would be redundant to comment on the same content as the code when reading the code is not a hassle.
If you write a comment with the consciousness that "the motive and purpose of processing is difficult to follow from the code", I think that the code will be easy to remember even if you look back over time.
** There is no loss in doing just the function arguments and return values **
The type annotation in Python is comment + α, which does not guarantee the type matching at runtime, but I think that it should be annotated as much as possible.
Annotation example
#Annotations for method arguments and return values are very good (you don't have to look inside the method)
class Person:
def __init__(self, first_name: str, last_name: str, age: int): #Annotations for method arguments
self._name: str = first_name + ' ' + last_name #Annotation for variables
self._age = age
def is_older_than(self, age: int) -> bool: #Annotation for method return value
return self._age > age
In particular, the arguments and return values of the published functions cannot be used unless they are type annotated, so I think they are essential. Of course, you can write it in docstring
. A reasonably sophisticated editor should also parse docstring
.
Here's how to type-annotate variables with built-in types like ʻintand
float`, and variables that instantiate classes.
Annotations of everyday types
age: int = 0
weight: float = 0.0
name: str = 'string'
is_student: bool = True
taro: Person = Person('taro', 'suzuki', 20)
Other built-in types that are often used are list
, dict
, and tuple
.
These are the same
python
friends: list = [daisuke, tomoko]
parents: tuple = (mother, father)
contacts: dict = {'email': 'xxx@mail', 'phone_number': 'XXXX'}
You can annotate like this, but you can use the typing module to annotate in more detail.
For example, in the above example, you can see that friends
is a list, but I don't know what kind of elements should be included. Here's how to annotate elements using typing
.
Detailed annotation of everyday types using typing
from typing import List, Tuple, Dict #Import for type annotation
friends: List[Person] = [daisuke, tomoko] #List with Person instance as an element
parents: Tuple[Person, Person] = (mother, father) #Tuple with a Person instance as two elements
contacts: Dict[str, str] = {'email': 'xxx@mail', 'phone_number': 'XXXX'} #key is str,Dictionary where value is str
typing
allows for more detailed type annotations.
Personally, I am very relieved because I can understand what kind of dictionary type I expect, especially if there are annotations for key and value in the type annotation of Dict
.
These can also be annotated in a nested structure. For example, some people may have multiple email addresses and phone numbers. Then you will want the value of contacts
to beList [str]
instead of str
. At that time
python
#key is str,A dictionary where value is a list of str
contacts: Dict[str, List[str]] =
{
'email':
['xxx@mail', 'yyy@mail'],
'phone_number':
['XXXX', 'YYYY']
}
It is possible to annotate as follows.
typing
enables various annotations other than the annotations listed above. I will introduce ʻUnion and ʻOptional
as frequently used ones.
Consider writing a function that changes the weight of a Person
instance. It's a very simple implementation, but it changes the weight by the weight received.
python
class Person:
...
def update_weight(self, weight: float) -> float
self.weight += weight
return self.weight
It looks good at first glance, but it feels a little sad to accept only float
as a change. It may be a little convenient if you also receive ʻint. I want to make a type annotation that is OK if it is ʻint
or float
. ʻUnion` can be used when you say "any of these types is OK".
python
class Person:
...
def update_weight(self, weight: Union[int, float]) -> float
self.weight += weight
return self.weight
This allows both ʻint and
float to be included in the ʻupdate_weight
argument.
When you start using ʻUnion, you may find that you often use the notation ʻUnion [A, None]
in your code.
For example, suppose you define a ʻOccupationclass that represents a profession. Let's give the
Personclass a profession for that person. However, maybe
Person is a student and has no profession. I want to say that the profession is ʻOccupation
or None
. You can use ʻUnion` in such a case.
python
class Person:
def __init__(..., occupation: Union[Occupation, None]):
self._occupation = occupation
As another example, let's say you want to have a function that gets the person's passport ID as a string. But you may not have a passport. One way is to return an empty string. But if you want to make it clear that it doesn't exist, you might consider returning None
.
python
class Person:
def get_passport_id(self) -> Union[str, None]:
if self.has_passport:
return self._passport._id
else:
return None
The more frequent the opportunity to do these type annotations, the more annoying it becomes. For such a case, ʻOptional is prepared. ʻOptional
is used like ʻOptional [A] and means ʻA
or None
. ʻOptional [A] = Union [A, None]`.
Using ʻOptional`, the previous example looks like this:
python
class Person:
def __init__(..., occupation: Optional[Occupation]):
self._occupation = occupation
def get_passport_id(self) -> Optional[str]:
if self.has_passport:
return self._passport._id
else:
return None
I think it's a little easier to convey the intent of the code, probably because the word ʻOptional` was assigned.
I don't use it that much, but it might be useful
You can define a new type. For example, as a function that searches for the corresponding Person
instance from the person_id
of type ʻint`
python
def find_by_id(person_id: int) -> Person:
...
I think you can write something like this. Here, the argument name is a descriptive name of person_id
, so it may not be confused so much, but you may inadvertently make a mistake of passing something like ʻoccupation_id defined with the same ʻint
type as an argument. not.
To prevent such accidental mistakes, I dare to define the PersonId
class.
python
class PersonId(int):
pass
def find_by_id(person_id: PersonId) -> Person:
...
p = find_by_id(PersonId(10))
This may be fine, but it incurs the overhead of instantiation.
With NewType
python
from typing import NewType
PersonId = NewType('PersonId', int)
def find_by_id(person_id: PersonId) -> Person:
...
p = find_by_id(PersonId(10))
However, this has a small overhead because it only calls a function that does nothing and returns 10 at runtime. Also, as the meaning of the code, it is ʻint`, but I think it is easy to see that human error is prevented by redefining this as a different type.
For example, if it is about Dict [str, List [str]]
, you may read "Well, str
is a key and str
is a dictionary with a list of elements of value", but this is List [Dict]. [str, Union [int, float, None]]] It's hard to understand when it comes to
, and it's a pain to add this much type annotation every time with a function that exchanges this type. In that case, if you use TypeAlias,
TypeAlias
TypeReportCard = List[Dict[str, Union[int, float, None]]]]
def set_report_card(report_card: TypeReportCard):
...
set_report_card([{'math': 9.5}, {'english': 7}, {'science': None}])
#Mistake example-> set_report_card(TypeReportCard([{'math': 9.5}, {'english': 7}, {'science': None}]))
You can write clearly like. Just create an alias and no special import is required. Unlike NewType
, it is just an alias, so when you actually use it as an argument, you do not need to wrap it in an alias name.
It may be interesting to read PEP 484
** When using the dictionary type, consider the data class once **
I wanted to have the contact information of a certain person as data, so I will express it in a dictionary type.
contacts_dictionary
contacts: Dict[str, Optional[str]] =
{
'email': 'xxx@mail',
'phone_number': None
}
If you look at this, it's a very common code. One day, you want to know "whether a contact has a phone number".
python
def has_phone_number(contacts: Dict[str, Optional[str]]) -> bool:
return contacts.get('phone_number', None) is not None
# return 'phone_number' in contacts and contacts['phone_number']is not None is fine
I think it works without any problems. Two weeks later, I needed this function again. Thanks to the type annotation, we can remember what the contacts
looks like and successfully call the function.
has_phone_number({'email': 'xxx@mail', 'phone-number': 'XXXX'})
However, the return value of this function will be False
. If you look closely, you can see that phone_number
has been changed to phone-number
. As a result, phone_number
does not exist and the result is False
. The type annotation of Dict [str, Optional [str]]
didn't know the name of the required key, so I couldn't remember the exact key name I decided two weeks ago.
This example may be easy to understand as the implementation of has_phone_number
is written just above. But what if this function is implemented far away? What if you don't immediately notice that the result is False
? I think debugging will be difficult.
It is standard to avoid directly embedded constants that appear in the code as much as possible, but you should also be careful about dictionary type keys.
In such a case, you may want to consider Data Class once.
Data class
#Substitute with dataclass
import dataclasses
@dataclasses.dataclass
class Contacts:
email: Optional[str]
phone_number: Optional[str]
dataclasses.dataclass
will automatically generate __init __
, so the notation as a class is sufficient above.
In addition, it has various functions such as automatic generation of __eq__
and automatic generation of __repr__
, but I will omit it this time.
If you define your contacts as a data class as above, you can implement has_phone_number
as below.
c = Contacts(email='xxx@mail', phone_number='XXXX')
def has_phone_number(contacts: Contacts) -> bool:
return contacts.phone_number is not None
Accessed as a data class field in this way, you won't make a typo (because the editor supports checking and suggestion).
Also, unlike when it was defined in Dict [str, Optional [str]]
, the key name (field name) is fixed, and each key is given a type, so what kind of data is it? It is more concrete whether you are requesting.
The data class is a feature from Python 3.7, so if you use 3.6 or less, you might consider NamedTuple
in typing
. NamedTuple
is used like this.
NamedTuple
#Substitute with NamedTuple
from typing import NamedTuple
class Contacts(NamedTuple):
email: Optional[str]
phone_number: Optional[str]
c = Contacts(email='xxx@mail', phone_number='XXXX')
def has_phone_number(contacts: Contacts) -> bool:
return contacts.phone_number is not None
The description is almost the same as data class
, but while data class
has the power to make more detailed settings, Named Tuple
is just a named tuple.
The last is enum.
It's very useful when a variable can only take A, B, or C ... For example, I often skim this kind of code.
def display(object_shape: str):
if object_shape == 'circle':
...
elif object_shape == 'rectangle':
...
elif object_shape == 'triangle':
...
else:
raise NotImplementedError
display('circle')
When dividing the process according to the state, it is written like this, and the state management becomes complicated later, or the state name is not known and the implementation is searched for. Also, if you look only at the type, it looks like anything is fine with str
, but in reality, you will get an error except for some strings.
In such a case, you may be able to write clearly by using an enumeration type (ʻenum`).
Enum
from enum import Enum
class ObjectShape(Enum):
CIRCLE = 0
RECTANGLE = 1
TRIANGLE = 2
def display(object_shape: ObjectShape):
if object_shape is ObjectShape.CIRCLE:
...
elif object_shape is ObjectShape.RECTANGLE:
...
elif object_shape is ObjectShape.TRIANGLE:
...
else:
raise NotImplementedError
display(ObjectShape.CIRCLE)
Now you don't have to worry about typos, and you can see at a glance what is allowed by looking at the mold. Also, this time I manually assigned all the identifiers of ʻEnum`, but this number should have no particular meaning, so
auto
import enum
class ObjectShape(enum.Enum):
CIRCLE = enum.auto()
RECTANGLE = enum.auto()
TRIANGLE = enum.auto()
I think it is best to have a unique value assigned automatically like this.