Pythonic thinking

The pythonic style is neither controlled nor enforced by the compiler. Python programmers prefer to be explicit, choose simplicity over complexity, and maximize readability. It's important for everyone to know how to do the most common things in Python best and in Pythonic. The pattern affects each and every program for the reader.

Know the version of Python to use

This time, Python3 (Python3.7 and Python3.8) will be used, and Python2 will not be handled. Many computers come with a pre-installed version of the standard CPython runtime. However, there is no default when typing python on the command line. Use the'--version' flag to know the exact version of Python.

$ python --version
Python 2.7.10

$ python3 --version
Python 3.8.0

Python3 is well maintained by the core Python developers and community and is constantly improving. Most of Python's most popular OSS libraries are Python 3 compliant. We strongly recommend that you use Python3 for all Python projects. (Python2 has expired in April 2020.)

Follow the PEP 8 style guide

The Python extension proposal, known as PEP8, is a style guide on how to format Python code. You can write your Python code as you like, as long as it has the correct syntax. However, if you follow a consistent style, your code will be easier to work with and easier to read. Sharing common styles with other Python programmers in a larger community will help you collaborate on your projects. PEP 8 has a wealth of detailed instructions on how to write clear Python code.

Blank

In Python, whitespace has syntactic superimposition. For Python programmers, the code is obvious. Pay particular attention to the effects of whitespace and its effects.

--Use whitespace for indentation instead of tabs. --Use four whitespace for syntactically meaningful level indentation. --Each line should be 79 characters or less in length --When using the next line to continue a long expression, indent with 4 additional whitespace from the normal indent. --In the file, separate functions and classes with two blank lines --In class, methods are separated by blank lines --In the dictionary, do not put a space between the key and the colon, put a space on the same line and a space before the value each time. --Before and after variable assignment, put one space and always only one. --In type hints (type annotations), put a colon immediately after the variable name and put a space before the type information.

Naming

PEP8 recommends different styles for different parts of the language. This makes it easier to tell what kind of name the name corresponds to when reading the code.

--Functions, variables, and attributes should have _ in lowercase, such as lowercase_underscore. --Protected attributes are prefixed with an underscore, such as _leading_underscore. --For private attributes, add two underscores at the beginning, such as __double_underscore. --Classes and exceptions are capitalized, as in CapitalizedWord. --Constants in modules are all uppercase and underscore like All_CAPS --Class methods use cls for the name of the first parameter.

Expressions and sentences

--Use the negation of the inner term (if a is not b) instead of the negation of the expression (if not a is b). --Do not use the length of the container or sequence (if len (somelist) == 0) to check if it is an empty value ([],'', etc.). Use if not somelist to implicitly evaluate empty values ​​to False. --Do not draw if statements, for loops, while loops, and except compound statements on one line, but make them multiple lines for clarity. -It's better to use parentheses to enclose multiple expressions than to separate them with .

Embed in f string without using C-style format string and str.format

Strings are used everywhere in the Python code base. Python has four different formatting methods built into the language and standard library. With the exception of one type, there are serious drawbacks that need to be understood and avoided.

Format operator%

This is a format method that is used more heavily in Python.

a = 0b10111011
b = 0xc5f
print('Binary is %d, hex is %d' % (a, b))
>>>
Binary is 187, hex is 3167

The format operator puts a format specification such as% d as a placeholder and replaces it with the value to the right of the format. The formatting syntax comes from the C language printf function, which is inherited by Python and other languages. There are four problems with this format.

--First problem

Type conversion errors get angry when the type or order of the data values ​​in the tuple to the right of the format expression changes.

key = 'my_var'
value = 1.234
formatted = '%-10s = %.2f' % (key, value)
print(formatted)

>>>
my_var = 1.23

This is fine, but swapping the key and value raises a runtime exception.

recordered_string = '%-10s = %.2f' % (value, key)

>>>
Traceback ...
TypeError: ...

If you leave the order of the parameters on the right and change the order of the format strings, you will get the same error. To avoid this problem, you should always check that both sides of the% operator are aligned. This is error prone because you have to check it every time there is a change.

--Second problem Difficult to read and understand if you have to modify the value a bit before formatting it to a string.

pantry = [
    ('avocados', 1.25),
    ('bananas', 2.5),
    ('cherries', 15),]
for i, (item, count) in enumerate(pandry):
    print('#%d: %-10s = $.2f % (i, item, count)')

>>>
0: avocados = 1.25
1: bananas = 2.5
2: cherries = 15

If you make a small change to make the output message a little more useful, it will be long and span multiple lines, making it unreadable.

print('#%d: %-10s = $.2f % (i + 1, item.title(), round(count))')

--Third problem If you want to use the same value multiple times in the format string, you have to write it repeatedly in the tuple on the right.

template = '%s lobes food. See %s cook.'
name = 'max'
formatted = template % (name, name)
print(formatted)

>>>
Max loves food. See Max cook.

It is tedious and error-prone when making minor corrections to the values ​​to be formatted.

--Fourth problem Using a dictionary in a format makes it verbose and noisy.

soup = 'lentil'
formatted = 'Today\'s soup is %{soup}s.' % {'soup': soup}
print(formatted)

>>>
Today's soup is lentil.

Not only are the characters duplicated, but this duplication lengthens the format expression that uses the dictionary.

format and str.format

a = 1234.5678
formatted = format(a, ',.2f')
print(formatted)
>>>
1,234.57
key = 'may_var'
value = 1.234

formatted = '{} = {}'.format(key, value)
print(formatted)
>>>
my_var = 1.234

By specifying the position index of the argument passed to the format method and using the placeholder, the first problem and the third problem can be solved.

# First problem
formatted = '{1} = {0}.format(key, value)'
print(formatted)
>>>
1.234 = my_var

# Third problem
formatted = '{0} loves food. See {0} cook.'.format(name)
print(formatted)
>>>
Max loves food. See Max cook.

The second problem has not been solved, and the readability has not changed much. The fourth problem is that repeated key redundancy cannot be eliminated.

# Second problem
formatted = '#{}: {:<10s} = {}'.format(i+1, item.title, round(count))
# Fourth problem
_format= ('Today\'s soup is {soup}, '
            'buy one get two {oyster} oysters, '
            'and our special entree is {special}.')
formatted = _format.format(
    soup='lentil',
    oyster='kumamoto',
    special='schnitzel',)

Formatted string

Python3.6 solves the above problem by introducing a formatted string, or f string for short. Prefix the format string with f, which is the same as prefixing the byte string with b and the raw string with r.

key = 'my_var'
value = 1.234

formatted = f'{key} = {value}'
print(formatted)

>>>
my_var = 1.234

The f string format has fewer characters than the above description. The second problem can also be solved.

formatted = f'#{i+1}: {item.title():<10s} = {round(count)}'

Helper function

Using Python's concise syntax, you can easily write a one-line expression that implements a lot of logic. For example, suppose you want to compound the query string of a URL. Suppose the query string argument is represented by an integer value as follows:

my_values = parse_qs('red=5&blue=0&green=',
                     keep_blank_values=True)
print(repr(my_values))

>>>
{'red': ['5], 'green': [''], 'blue': ['0']}

This uses the get method and returns different values ​​depending on the situation.

print('Red:     ', my_values.get('red'))
print('Green:   ', my_values.get('red'))
print('Opacity: ', my_values.get('red'))

>>>
Red:     ['5']
Green:   ['']
Opacity: None

If there is no argument or it is blank, it seems better to set the default location to 0. This isn't as much logic as using the if field helper function, so choose to do it with a formula.

red = my_values.get('red', [''])[0] or 0
green = my_values.get('green', [''])[0] or 0
opacity = my_values.get('opacity', [''])[0] or 0
print(f'Red:     {red!r}')
print(f'Green:     {green!r}')
print(f'Opacity:     {opacity!r}')

>>>
Red:     '5'
Green:   0
Opacity: 0

--In the case of Red, the value is a list of one element, the character string '5', and the character string is evaluated as True, so the first part of the or expression is assigned to red. --In the case of Green, 1 element is a list of empty strings, and it is evaluated as False, so 0 is substituted. --In the case of Opacity, it is a list of characters from 1 element, and it is evaluated as False, so 0 is substituted.

Wrap the above with the built-in function int and parse the string as an integer.

red = int(my_values.get('red', [''])[0] or 0)

This makes it very difficult to read. It takes time to understand this because it needs to be disassembled and understood. So in Python it is possible to make it clearer with trinomials.

red_str = my_values.get('red', [''])
red = int(red_str[0]) if red_str[0] else 0

This isn't as clear as a full if/else spanning multiple lines. Also, if you need to use this logic several times, it's a good idea to write a helper function.

def get_first_int(values, key, default=0):
    found = values.get(key, [''])
    if found[0]:
        return int(found[0])
    else:
        return default
green = get_first_int(my_values, 'green')

This will allow you to write more clearly. When the expression gets complicated, it's time to think about breaking it down into smaller pieces and transferring the logic to helper functions. The benefits of readability always outweigh the benefits of simplicity.

Multiple assignment unpack

item = ('Peanut butter), 'Jelly')
# Access by index
first = item
second = item[1]
print(first, 'and', second)
# Unpack
first, second = item
print(first, 'and', second)
>>>
Peanut butter and Jelly
Peanut butter and Jelly

Unpacking looks cleaner and has fewer rows than accessing tuples by index. Now, in the aiming bubble sort algorithm, let's swap the value of list.

def bubble_sort(a):
for _ in range(len(a)):
    for i in range(1, len(a)):
        if a[i] < a[i-1]:
            a[i-1], a[i] = a[i], a[i-1]
names = ['pretzels', 'carrots', 'arugula', 'bacon']
bubble_sort(names)
print(names)

>>>
['arugula', 'bacon', 'carrots', 'pretzels']

This swap operation first evaluates a [i], a [i-1] to the right of the assignment operator and stores the value in a temporary anonymous tuple. Then, using the unpacked pattern on the left, take the tuple value and assign it to the variables a [i-1] and a [i], respectively. Finally, a temporary anonymous tuple is destroyed.

Use enumerate instead of range

The built-in function range is useful for iterating over loops over a set of integers.

for i in range(10):
    print(i)

>>>
0123456789

List processing often requires an index of the elements in the list.

flavor_list = ['vanilla', 'chocolate', 'pecan', 'strawberry']
for i in range(len(flavor_list)):
    flavor = flavor_list[i]
    print(f'{i+1}: {flavor}')
>>>
1: vanilla
2: chocolate
3: pecan
4: strawberry

This requires the length of the list and you need to use an index like an array. Python has a built-in function enumerate for this situation. enumerate wraps the iterator in a lazy evaluation generator. enumerate yields the next value of the loop index and iterator.

it = enumerate(flavor_list)
print(next(it))
print(next(it))

# Unpacking
for i, flavor in enumerate(flavor_list):
    print(f'{i+1}: {flavor}')
>>>
(0, 'vanilla')
(1, 'chocolate')
0, vanilla
1, chocolate
2: pecan
3: strawberry

Use zip to process iterators in parallel

To process both the list of characters and the list of its length in parallel, perform the following processing.

names = ['cecilia', 'Lisa', 'Marie']
counts = [len(n) for n in names]
longest_name = None
max_count = 0

for i in enumerate(names):
    count = counts[i]
    if count > max_count:
        longest_name = name
        max_count = count

These are the names and counts indexes that make the code hard to read. To make such code clearer, Python has a built-in function zip. Zip wraps two or more iterators with a lazy evaluation generator.

for name, count in zip(names, counts):
    if count > max_count:
        longest_name = name
        max_count = count

Zip is a wrapping iterator that processes elements one by one. Therefore, there is no risk of crashing due to overuse of memory even with infinitely long inputs.

There is a point to note about the behavior of zip. Suppose you forget to update counts by adding an element to names like this: This has unexpected consequences.

names.append('Bob')
for name, count in zip(names, counts):
    print(name)
    
>>>
Cecilia
Lise
Marie

There is no'Bob'added. This happens because the lists that are subject to zip processing do not have the same shade. So if you're not sure if the list lengths are the same, consider using the zip_longest function in the built-in module itertool.

import itertools

for name, count in itertools.zip_longest(names, counts):
    print(f'{name}: {count}')

>>>
Cecilia: 7
Lise: 4
Marie: 5
Bob: 3

Substitution formulas prevent repetition

The assignment expression is a new syntax in Python 3.8 represented by the walrus operator that sings Python language problems with duplication.

--Normal assignment statement: a = b --Substitution expression: a: = b

The assignment expression is convenient because it can be assigned to a variable at a place where assignment is not allowed until yesterday as a condition of the if statement. When you want to perform a certain process, check if the number is zero with an if statement.

fresh_fruit = {
    'apple': 10,
    'banana': 8,
    'lemon': 5,}

count = fresh_fruit.get('lemon', 0)
if count:
 Process A
else:
 Process B

The problem with this simple code is that it's hard to read. By defining count before the if statement, it looks like the variable count will be used in all code after that, including the else block, but that's not the case. As you can see, the code pattern for checking values ​​is very common in Python. We will redraw this using the walrus operator.

if count := fresh_fruit.get('lemon', 0):
 Process A
else:
 Process B

This makes it much easier to read as it is clear that count is only relevant to the first block of the if statement. The assignment expression first assigns a value to the variable count and then evaluates that value in the context of the if statement to determine what to do with flow control. If you want to execute'Process A'when there are more than a certain number, do as follows.

if (count := fresh_fruit.get('lemon', 0)) >= 3:
 Process A
else:
 Process B

There is no other switch/case syntax in Python, but you can use it to write:

if (count := fresh_fruit.get('banana', 0)) >= 2:
 Process A
elif (count := fresh_fruit.get('apple', 0)) >= 4:
 Process B
elif (count := fresh_fruit.get('lemon', 0)) >= 3:
 Process C
else:
 Process D

Recommended Posts

Pythonic thinking
"Effective Python 2nd Edition" Chapter 1 <Pythonic Thinking>
[Effective Python study memo] Chapter 1: Python-style thinking (Pythonic Thinking)