Five useful Python data types that are easy to forget

The standard library of Python is very powerful, but there are too many libraries to grasp, and there are many people who know but forget their existence and reinvent the wheel. At least I'm one of those people, so I'll introduce some data types included in the Python standard library that are useful but not used unless you are aware of them, as a memo for yourself.

DefaultDict Official document: https://docs.python.jp/3/library/collections.html#collections.defaultdict

Literally, a dictionary type that allows you to set default values. The nice thing about this is that you don't have to check each key to see if it's in the dictionary. For example, when counting the number of occurrences of a word, you can use it in the following form:

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> string = "python is way way way way better than java"
>>> for w in string.split(" "):
...     d[w] += 1
...
>>> d.items()
dict_items([('better', 1), ('than', 1), ('python', 1), ('java', 1), ('way', 4), ('is', 1)])

By the way, it's hard to understand at first glance, but the constructor of defaultdict takes a function that generates a value (to be exact, a callable object) as an argument instead of the default value. So, if you do the following, you will get an error.

>>> from collections import defaultdict
>>> d = defaultdict(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: first argument must be callable or None

Correctly,

>>> d = defaultdict(lambda: 0)

Or

>>> d = defaultdict(int)

(Anything that returns 0 when called).

Counter Official document: https://docs.python.jp/3/library/collections.html#collections.Counter

If you just want to count the words, the Counter class is overwhelmingly convenient. If you pass a list, the number of elements will be counted, and if you pass a character string, the number of characters will be counted for each element / character.

>>> from collections import Counter
>>> c = Counter("python is way way way way better than C".split(" ")
>>> c
Counter({'way': 3, 'is': 1, 'better': 1, 'python': 1, 'C': 1, 'than': 1})
>>> c.most_common(1)
[('way', 3)]

It's a process that you might end up implementing yourself, but since there is such a convenient data type, let's use it. As you can see, you can easily get the mode and the most frequently occurring n elements. You can also add or subtract between Counters, so Counters are also useful when you want to compare sentences.

deque Official document: https://docs.python.jp/3/library/collections.html#collections.deque

In python, the built-in list type already has a pop method, so it's easy to overlook it, but you can retrieve and delete ** elements with ** O (1) from both the beginning and the end. There is a data type called deque. By the way, the list type is an O (n) operation because deleting data from the beginning causes movement of elements. Also, deque takes a parameter called maxlen at initialization, in which case if you try to add an element greater than maxlen, it will automatically remove it from the first element. Deques come into play when you need a data structure that changes in length dynamically and from both directions. For example, when you want to manage a history of a fixed length.

>>> from collections import deque
>>> history = deque(maxlen=100)
>>> lines = open("python.txt")
>>> for line in lines:
...     if 'python' in line:
...         print(lines)
...     history.append(line)

Even better, deques are ** thread safe **. It can also be used as a means of data sharing in a system where producers and consumers are in multiple threads.

PriorityQueue Official document: https://docs.python.jp/3/library/queue.html#queue.PriorityQueue

I didn't know until recently, but Python implements Priority Queue in the standard library. I've implemented it using heapq before, but you didn't have to do that. PriorityQueue is useful when implementing search algorithms. Breadth-first search, depth-first search, and A * search can also be regarded as the same algorithm except that the priority of PriorityQueue is different. By the way, Priority Queue is also thread safe.

OrderedDict Official document: https://docs.python.jp/3/library/collections.html#collections.OrderedDict

Python dictionary elements are basically out of order. Therefore, when using elements dynamically, the order in which they are returned when the elements of the dictionary are accessed sequentially is indefinite. In addition, trying to sort the elements of a dictionary can be a hassle (or rather, the dictionary itself cannot be sorted). OrderedDict makes it easy to do these things. For example, consider a situation where you manage your test scores in a dictionary and display the scores in different orders:

>>> from collections import OrderedDict
>>> d = OrderedDict({"Suzuki": 100, "Tanaka": 30, "Sato": 50})
>>> sorted(d.items(), key=lambda x: x[1])
[('Tanaka', 30), ('Sato', 50), ('Suzuki', 100)]
>>> sorted(d.items(), key=lambda x: x[0])
[('Sato', 50), ('Suzuki', 100), ('Tanaka', 30)]

As in the example above, it's easy to sort by score or by name.

Summary

This time I've focused on five data types that I find particularly useful, but Python has a variety of useful standard libraries that I recommend reading (I'm still new). You may discover that).

Recommended Posts

Five useful Python data types that are easy to forget
Python list comprehensions that are easy to forget
Regular expressions that are easy and solid to learn in Python
10 Python errors that are common to beginners
Enum types that enter the standard library from Python 3.4 are still useful
A summary of Python e-books that are useful for free-to-read data analysis
Syntax that Perl users tend to forget in Python
Points that are easy to make mistakes when using lambda during Python loop processing
[Python] It might be useful to list the data frames
Python error messages are specific and easy to understand "ga" (before that, a colon (:) and a semicolon (;))
Python notes to forget soon
[Python] An easy way to visualize energy data interactively [plotly.express]
A python script that converts Oracle Database data to csv
Data visualization library "folium" by Python is very easy to use
Links to people who are just starting data analysis with python
[Python] How to FFT mp3 data
Easy to use Jupyter notebook (Python3.5)
Useful to remember! 10 Python Standard Libraries
Easy data visualization with Python seaborn.
Easy Python to learn while writing
[Python Tutorial] An Easy Introduction to Python
[Python] I tried to analyze the characteristics of thumbnails that are easy to play on YouTube by deep learning
[Python] Use pandas to extract △△ that maximizes ○○
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.1-8.2.5)
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.3-8.3.6.1)
Convert Excel data to JSON with python
Python 3.9 dict merge (`|`) seems to be useful
[Introduction to Python3 Day 19] Chapter 8 Data Destinations (8.4-8.5)
Easy way to use Wikipedia in Python
[Introduction to Python3 Day 18] Chapter 8 Data Destinations (8.3.6.2 to 8.3.6.3)
Recommended books by 3 types related to Python
Convert Python date types to RFC822 format
How to use "deque" for Python data
Compress python data and write to sqlite
[Note] Terms that are difficult to remember
[Introduction to Data Scientists] Basics of Python ♬
How to replace with Pandas DataFrame, which is useful for data analysis (easy)
[Python] Solution to the problem that elements are linked when copying a list
Python error messages are specific and easy to understand "ga" (... AAA yyy BBB)