[PYTHON] I don't want an if document when processing after confirming the existence of the dict key

2017.3.13 update

-Additional verification of the processing commented by @shiracamus and @tenmyo --The result has changed, so fix it.

if document

Like this

if key not in test_dict:
    test_dict[key] = 0
test_dict[key] += 1

I want to write like this

test_dict[key] = test_dict.get(key, 0) + count

Try to measure the speed

I'm scared if it's late

# coding=utf-8
import random
import time
import collections

S = "abcdefghijklmnopqrstuvwxyz01234567890"
sum_map = dict()


def get_rand_key():
    return "".join([random.choice(S) for i in range(3)])


def test01(max=10000):
    count = 0
    test_dict = dict()
    while count <= max:
        count += 1
        key = get_rand_key()
        if key not in test_dict:
            test_dict[key] = 0
        test_dict[key] += count


def test02(max=10000):
    count = 0
    test_dict = dict()
    while count <= max:
        count += 1
        key = get_rand_key()
        test_dict[key] = test_dict.get(key, 0) + count


def test03(max=10000):
    count = 0
    test_dict = collections.defaultdict(int)
    while count <= max:
        count += 1
        key = get_rand_key()
        test_dict[key] += count


def test04(max=10000):
    count = 0
    test_dict = collections.Counter()
    while count <= max:
        count += 1
        key = get_rand_key()
        test_dict[key] += count


def check(method, num, loop_num):
    name = method.__name__
    start_ts = time.time()
    method(max=num)
    diff = time.time() - start_ts
    sum_map[name] = sum_map.get(name, 0) + diff
    print "[{}] {} {}sec".format(loop_num, name, diff)


for i in range(20):
    num = 1000000
    check(test01, num, i)
    check(test02, num, i)
    check(test03, num, i)
    check(test04, num, i)

for key in sorted(sum_map.keys()):
    print key, sum_map[key]

~~ It doesn't change much ~~ defaultdict seems to be fast!

~~ I tried it somehow, but it seems that there is not much difference. I will write in the shorter one and live. ~~

The measurement with defaultdict commented by @shiracamus seems to be quick. I tried it three times and got the same result.

ifYes .get defaultdict Counter
50.5 52.6 49.3 55.4
53.0 53.8 50.2 56.6
53.5 53.7 49.9 54.7

Let's use defaultdict next time!

Moyari point

It seems that which one is much faster under some specific circumstances. It seems to change depending on the hit ratio. There seems to be various patterns ...

If anyone knows, please let me know

Why is Counter slow? I wonder if the usage is different.

Recommended Posts

I don't want an if document when processing after confirming the existence of the dict key
I don't want to admit it ... The dynamical representation of Neural Networks
Python Note: When you want to know the attributes of an object
Processing when the key input of Python pygame does not go well.
I want to leave an arbitrary command in the command history of Shell
Python: I want to measure the processing time of a function neatly
I want to store the result of% time, %% time, etc. in an object (variable)
I checked the processing speed of numpy one-dimensionalization
I want to customize the appearance of zabbix
I want to be notified of the connection environment when the Raspberry Pi connects to the network