Effective Python Note Item 16 Consider returning a generator without returning a list

This is a memo of O'Reilly Japan's book effective python. https://www.oreilly.co.jp/books/9784873117560/ P35~37

** The list is the simplest if you want to return the results in a sequence **

Consider the case of checking the position of whitespace characters in a sentence

def index_words(text):
    result = []
    if text:
        result.append(0)
    for index, letter in enumerate(text):
        if letter == ' ':
            result.append(index + 1)
    return result

address = 'Four score and secer years ago...'
result = index_words(address)
print(result[:3])


>>>
[0, 5, 11]

The operation itself is normal, but there are two problems

  1. Many characters in the code (redundant)
  2. Wasteful generation of result list

1. Many characters in the code (redundant)

There is a point that it is difficult to read as a whole by adding it with append many times in the function definition. It is convenient to use a generator in such cases

def index_words_iter(text):
    if text:
        yield 0
    for index, letter in enumerate(text):
        if letter == " ":
            yield index + 1

result = list(index_words_iter(address))
print(result[:3])

>>>
[0, 5, 11]

It returns an iterator each time by using yield. You can also easily generate a list by passing an iterator to list ().

2. Wasteful generation of result list

Creating a list of results in the index_words function means that memory will be consumed accordingly. For large data processing, there is a risk of a crash there.

In that respect, the generator outputs one piece of data each time, so it can handle any length. ** Minimize memory consumption ** A generator that reads data from a file and processes it one by one

def index_file(handle):
    offset = 0
    for line in handle:
        if line:
            yield offset
        for letter in line:
            offset += 1
            if letter ==" ":
                yield offset
from itertools import islice
with open("address.txt", "r") as f:
    it = index_file(f)
    results = islice(it, 0, 3)
    print(list(results))

>>>
[0, 5, 11]

Now you can handle sentences of any length and you don't have to worry about memory crashes. However, you need to be aware that due to the nature of iterators and generators, the content changes each time you call it (stateful).

Recommended Posts

Effective Python Note Item 16 Consider returning a generator without returning a list
Effective Python Memo Item 9 Consider generator expressions for large comprehensions
Effective Python Memo Item 3
Python list is not a list
A note about [python] __debug__
Effective Python Note Item 17 Respect for certainty when using iterators for arguments
Effective Python Note Item 15 Know how closures relate to function scope
Get a list of files in a folder with python without a path
Python: A Note About Classes 1 "Abstract"
python / Make a dict from a list.
Precautions when creating a Python generator
A note about mock (Python mock library)
Effective Python Memo Item 4 Write a helper function instead of a complicated expression
Effective Python memo Item 7 Use list comprehension instead of map and filter
A note where a Python beginner got stuck
[Python] How to convert a 2D list to a 1D list
Display a list of alphabets in Python 3
Python comprehension (list and generator expressions) [additional]
Effective Python memo Item 10 Enumerate from range
[python] Get a list of instance variables
[Python] Get a list of folders only
EP 16 Consider Generator Instead of Returning Lists
Effective Python Note Item 12 Avoid using else blocks after for and while loops