[PYTHON] Create a defaultdict that returns a defaultdict to create a world where KeyErrror does not occur (+ JSON parsing example)

Most of the things that everyone loves collections are prepared and interesting.

Create a defaultdict that returns a defaultdict to create a world where KeyErrror does not occur

defaultdict is an extension of dict that allows you to set the behavior if the key is not found.

8.3. collections — High Performance Container Data Types — Python 2.7ja1 documentation

It is very convenient to use it for aggregation processing, etc., because it is not necessary to classify the behavior depending on the presence or absence of the key. This is a reprint of the usage example, but this is probably the easiest to understand.

>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
...     d[k].append(v)
...
>>> d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

There is an example of using it as a counter, but if you want to use it as a simple counter, it is recommended to consider collections.Counter [^ counter].

[^ counter]: 8.3. collections — High Performance Container Data Types — Python 2.7ja1 documentation

Use in nested dictionaries

For example, if you call d ['a'] ['b'] ['c'] in a nested dictionary, even if d is defaultdict, d ['a'] If the value of is the normal dict, then KeyErrror may occur. Therefore, try the following for the default_factory passed to the defaultdict.

def _factory():
    return collections.defaultdict(_factory)

With this, we have a world where KeyErrror does not occur: accept :.

>>> import collections
>>> def _factory():
...     return collections.defaultdict(_factory)
... 
>>> d = collections.defaultdict(_factory)
>>> d['a']
defaultdict(<function _factory at 0x105e93aa0>, {})
>>> d['a']['b']
defaultdict(<function _factory at 0x105e93aa0>, {})
>>> d['a']['b'][1]
defaultdict(<function _factory at 0x105e93aa0>, {})
>>> d[1][1][0][1][1][1][1][1]
defaultdict(<function _factory at 0x105e93aa0>, {})

Example of using JSON in parsing

Let's apply the above example to a real-world usage example. The following is just a thought alone, so there may be a cleaner way to make it easier. Please tell me if you know.

For example, suppose you have the following JSON of course information for online lessons. I don't like the place of ʻaddress`. There may or may not be a key.

{  
  "class":{  
    "id":1,
    "subject":"Math",
    "students":[  
      {  
        "name":"Alice",
        "age":30
      },
      {  
        "name":"Bob",
        "age":40,
        "address":{  
          "country":"JP"
        }
      },
      {  
        "name":"Charlie",
        "age":20,
        "address":{  
          "country":"US",
          "state":"MA",
          "locality":"Boston"
        }
      }
    ]
  }
}

Let's read this in Python.

In [47]: j = json.loads(s)

In [54]: for student in j["class"]["students"]:
    print(student["name"])
   ....:     
Alice
Bob
Charlie

It's okay because everyone has name, but when I try to get state information because I want state information, Bob doesn't have state information and Alice doesn't have the key itself ʻaddress` in the first place. Hmm.

In [55]: for student in j["class"]["students"]:
    print(student["address"]["state"])
   ....:
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-55-69836c86e040> in <module>()
      1 for student in j["class"]["students"]:
----> 2     print(student["address"]["state"])
      3 

KeyError: 'address'

You can specify the default value by using dict.get (key, default_val), but it will be redundant because it is nested in multiple stages. The more depth you have, the harder it gets.

So it's defaultdict. json.load and json.loads have a port called ʻobject_hook` that specifies the hook processing for the dict of the decoding result, so let's use it. Python is a wonderful language to have such an API. 18.2. json — JSON encoder and decoder — Python 2.7ja1 documentation

Define the following method

def _hook(d):
    return collections.defaultdict(_factory, d)

Pass it to ʻobject_hook in json.loads`.

In [75]: j2 = json.loads(s, object_hook=_hook)
In [83]: for student in j2["class"]["students"]:
   ....:     print(student["address"]["state"])
   ....:     
defaultdict(<function _factory at 0x10a57ccf8>, {})
defaultdict(<function _factory at 0x10a57ccf8>, {})
MA

Did it. KeyErrror does not occur. As it is a little difficult to use as it is, I will pass it through the auxiliary method for conversion. I've made it possible to specify an alternative value to use, and the default alternative value is the string default_state.

In [91]: def _dd(v, alt_val="default_state"):
    return alt_val if isinstance(v, collections.defaultdict) and len(v) == 0 else v
   ....: 

In [92]: for student in j2["class"]["students"]:
    print(_dd(student["address"]["state"]))
   ....:     
default_state
default_state
MA

Now you can specify the default value with a small amount of description, no matter where the missing key is. If it is actually the last call (meaning the call timing at student [" address "] [" state "] instead of student [" address "] in the above code), the default value is returned. I wanted to, but I gave up because I couldn't determine if it was the last call. If you know how to do it, please let me know.

that's all.

Recommended Posts

Create a defaultdict that returns a defaultdict to create a world where KeyErrror does not occur (+ JSON parsing example)
Tornado-Let's create a Web API that easily returns JSON with JSON
A special Python codec that seems to know but does not know
How to fix a bug that jupyter notebook does not start automatically
I tried to create a class that can easily serialize Json in Python
A story that sometimes does not work if pip is up to date
I tried to make a dictionary function that does not distinguish between cases
How to create a JSON file in Python