[PYTHON] 4 ways to deal with missing dict keys

Effective Python 2nd Edition-90 Items to Improve Python Programs is really wonderful, so I'm reading it while crying.

Among them, there was an item on how to deal with the missing key of dict, so I would like you to read this book for details, but I measured it because I was concerned about the processing time of each.

This time it's pretty easy, but I'll write a process to count the characters that appear in a certain string. The execution environment is the default for Google Colab.

First, import the required libraries.

import time, defaultdict

Target an appropriate character string for aggregation.

target = 'super_string_of_my_passages. but this does not make sense at all. because this is nothing'

Finally, the key-value sorted by the number of occurrences is output, and the expected result is as follows.

[('s', 13),
 (' ', 12),
 ('e', 8),
 ('t', 7),
 ('a', 6),
 ('i', 5),
 ('n', 5),
 ('_', 4),
 ('o', 4),
 ('u', 3),
 ('g', 3),
 ('h', 3),
 ('p', 2),
 ('r', 2),
 ('m', 2),
 ('.', 2),
 ('b', 2),
 ('l', 2),
 ('f', 1),
 ('y', 1),
 ('d', 1),
 ('k', 1),
 ('c', 1)]

When using in in an if statement

Check if the key exists in the if statement, and give the initial value to the missing key by using the in expression that returns True. Probably the first simple way to come up with.

%%time
ranking = {}
for key in target:
    if key in ranking.keys():
        count = ranking[key]
    else:
        count = 0
    ranking[key] = count + 1
sorted(ranking.items(), key=lambda x: x[1], reverse=True)

CPU times: user 45 µs, sys: 9 µs, total: 54 µs Wall time: 56.3 µs

When using KeyError in a try statement

Use the try-except statement to handle the KeyError that is the cause of the error as the expected error.

%%time
ranking = {}
for key in target:
    try:
        count = ranking[key]
    except KeyError:
        count = 0
    ranking[key] = count + 1
sorted(ranking.items(), key=lambda x: x[1], reverse=True)

CPU times: user 59 µs, sys: 11 µs, total: 70 µs Wall time: 78.2 µs

When using the get method

Use the get method provided by the built-in dict.

%%time
ranking = {}
for key in target:
    count = ranking.get(key, 0)
    ranking[key] = count + 1
sorted(ranking.items(), key=lambda x: x[1], reverse=True)

CPU times: user 43 µs, sys: 8 µs, total: 51 µs Wall time: 53.6 µs

When using defaultdict

%%time
ranking = defaultdict(int)

for s in target:
    ranking[s] += 1
sorted(ranking.items(), key=lambda x: x[1], reverse=True)

CPU times: user 36 µs, sys: 8 µs,　total: 44 µs　 Wall time: 47.2 µs

Conclusion

The defaultdict may be good! (* ^^)

defaultdict is not universal and can cause unexpected errors, so use it with caution. Reference: (http://yoshidabenjiro.hatenablog.com/entry/2017/09/05/012828)

Extra edition

That's all for the methods covered in this book, Hey, if you do this kind of processing, you can use him! Can't you forget it? I will write it as an extra edition because it seems to be thrust into. In such a simple case, you can use the Counter class of the collections library. It is a convenient one that counts the number of occurrences in each element. Since there is a most_common method that sorts by the number, use it.

from collections import Counter

%%time
ranking = Counter(target)
ranking.most_common()

CPU times: user 53 µs, sys: 0 ns, total: 53 µs Wall time: 56.5 µs

Thank you very much!