[PYTHON] 4 ways to deal with missing dict keys

Effective Python 2nd Edition-90 Items to Improve Python Programs is really wonderful, so I'm reading it while crying.

Among them, there was an item on how to deal with the missing key of dict, so I would like you to read this book for details, but I measured it because I was concerned about the processing time of each.

This time it's pretty easy, but I'll write a process to count the characters that appear in a certain string. The execution environment is the default for Google Colab.

First, import the required libraries.

import time, defaultdict

Target an appropriate character string for aggregation.

target = 'super_string_of_my_passages. but this does not make sense at all. because this is nothing'

Finally, the key-value sorted by the number of occurrences is output, and the expected result is as follows.

[('s', 13),
 (' ', 12),
 ('e', 8),
 ('t', 7),
 ('a', 6),
 ('i', 5),
 ('n', 5),
 ('_', 4),
 ('o', 4),
 ('u', 3),
 ('g', 3),
 ('h', 3),
 ('p', 2),
 ('r', 2),
 ('m', 2),
 ('.', 2),
 ('b', 2),
 ('l', 2),
 ('f', 1),
 ('y', 1),
 ('d', 1),
 ('k', 1),
 ('c', 1)]

When using in in an if statement

Check if the key exists in the if statement, and give the initial value to the missing key by using the in expression that returns True. Probably the first simple way to come up with.

%%time
ranking = {}
for key in target:
    if key in ranking.keys():
        count = ranking[key]
    else:
        count = 0
    ranking[key] = count + 1
sorted(ranking.items(), key=lambda x: x[1], reverse=True)

CPU times: user 45 µs, sys: 9 µs, total: 54 µs Wall time: 56.3 µs

When using KeyError in a try statement

Use the try-except statement to handle the KeyError that is the cause of the error as the expected error.

%%time
ranking = {}
for key in target:
    try:
        count = ranking[key]
    except KeyError:
        count = 0
    ranking[key] = count + 1
sorted(ranking.items(), key=lambda x: x[1], reverse=True)   

CPU times: user 59 µs, sys: 11 µs, total: 70 µs Wall time: 78.2 µs

When using the get method

Use the get method provided by the built-in dict.

%%time
ranking = {}
for key in target:
    count = ranking.get(key, 0)
    ranking[key] = count + 1
sorted(ranking.items(), key=lambda x: x[1], reverse=True)   

CPU times: user 43 µs, sys: 8 µs, total: 51 µs Wall time: 53.6 µs

When using defaultdict

%%time
ranking = defaultdict(int)

for s in target:
    ranking[s] += 1
sorted(ranking.items(), key=lambda x: x[1], reverse=True)

CPU times: user 36 µs, sys: 8 µs, total: 44 µs  Wall time: 47.2 µs

Conclusion

The defaultdict may be good! (* ^^)

Extra edition

That's all for the methods covered in this book, Hey, if you do this kind of processing, you can use him! Can't you forget it? I will write it as an extra edition because it seems to be thrust into. In such a simple case, you can use the Counter class of the collections library. It is a convenient one that counts the number of occurrences in each element. Since there is a most_common method that sorts by the number, use it.

from collections import Counter

%%time
ranking = Counter(target)
ranking.most_common()

CPU times: user 53 µs, sys: 0 ns, total: 53 µs Wall time: 56.5 µs

Thank you very much!

Recommended Posts

4 ways to deal with missing dict keys
2 ways to deal with SessionNotCreatedException
How to deal with imbalanced data
How to deal with imbalanced data
How to deal with enum compatibility errors
[Python] How to deal with module errors
Various ways to destroy resources with scope
How to deal with memory leaks in matplotlib.pyplot
How to deal with errors when hitting pip ②
can't pickle annoy. How to deal with Annoy objects
How to deal with module'tensorflow' has no attribute'〇〇'
How to deal with SessionNotCreatedException when using Selenium
Method to get all keys of nested dict
How to deal with Django's Template Does Not Exist
How to deal with pyenv initialization failure in fish 3.1.0
[Python] How to deal with pandas read_html read error
How to deal with Executing transaction: failed in Anaconda
Convert dict to array
[AWS] How to deal with "Invalid codepoint" error in CloudSearch
For beginners, how to deal with common errors in keras
How to deal with UnicodeDecodeError when executing google image download