[PYTHON] Extract specific data from complex JSON

Introduction

Originally, it is a JSON for simply expressing a complicated data structure, but it may become complicated due to improper design or undecided specifications. The complexity here is that you are not sure what level of data you want to get from JSON and how many.

The purpose of this time is to get specific data from such complicated JSON.

This goal

input

test.json


{"a": [{
  "b": 1, 
  "c": [{
    "star": [{"Deneb": "Deneb", "Altair": "Altair", "Vega": "Vega"}], 
    "d": [2,3]
    }], 
  "e": {"g": "x"}
  }],
 "f": "y",
}

output

[{"Deneb": "Deneb", "Altair": "Altair", "Vega": "Vega"}]

The data you want to get is

→ Search JSON recursively

Implementation

python


import json

def search(arg, cond):
    res =[]
    if cond(arg):
        res.append(arg)
    if isinstance(arg, list):
        for item in arg:
            res += search(item, cond)
    elif isinstance(arg, dict):
        for value in arg.values():
            res += search(value, cond)
    return res

def has_star_key(arg):
    if isinstance(arg, dict):
        return arg.keys() == {"Deneb", "Altair", "Vega"}
            
def get_star(arg):
    return search(arg, has_star_key)


if __name__ == "__main__":
    with open('test.json', encoding='utf-8') as f:
        data = json.load(f)
        print(get_star(data)) # [{"Deneb": "Deneb", "Altair": "Altair", "Vega": "Vega"}]

I was able to get the desired data!

Commentary

① ** Function to search JSON **

python


def search(arg, cond):
    res =[]
    if cond(arg):
        res.append(arg)
    if isinstance(arg, list):
        for item in arg:
            res += search(item, cond)
    elif isinstance(arg, dict):
        for value in arg.values():
            res += search(value, cond)
    return res

First argument: Str type or Int type or List type or Dict type Second argument: Function that determines whether the data you want to acquire (function (2)) Return value: List type

② ** Function to judge data ** If there is data you want to retrieve, it returns True, otherwise it returns False.

python


def has_star_key(arg):
    if isinstance(arg, dict):
        return arg.keys() == {"Deneb", "Altair", "Vega"}

First argument: Str type or Int type or List type or Dict type Return value: Bool type

③ ** Function to get data **

python


def get_star(arg):
    return search(arg, has_star_key)

First argument: Dict type Return value: List type

test

Let's check if the conditions are changed or if other JSON data can be acquired. It can be executed simply by changing the judgment conditions of the data you want to acquire. (Change of function of ② and accompanying change of function of ③)

** Case1. Get all Str type data ** input

python


{"a": [{
  "b": 1, 
  "c": [{
    "star": [{"Deneb": "Deneb", "Altair": "Altair", "Vega": "Vega"}], 
    "d": [2,3]
    }], 
  "e": {"g": "x"}
  }],
 "f": "y",
}

condition

python


def has_str(s):
    return isinstance(s, str)

output

python


["Deneb", "Vega", "Altair", "x", "y"]

No matter how many data I wanted to get, they got it properly.

** Case2. Get specific data from JSON with large data ** input

python


{"DBEngineVersions":[{"Engine":"mysql","DBEngineVersionDescription":"MySQL 5.1.73a","DBParameterGroupFamily":"mysql5.1","DBEngineDescription":"MySQL Community Edition","EngineVersion":"5.1.73a"},{"Engine":"mysql","DBEngineVersionDescription":"MySQL 5.1.73b","DBParameterGroupFamily":"mysql5.1","DBEngineDescription":"MySQL Community Edition","EngineVersion":"5.1.73b"},{"Engine":"mysql","DBEngineVersionDescription":"MySQL 5.5.40","DBParameterGroupFamily":"mysql5.5","DBEngineDescription":"MySQL Community Edition","EngineVersion":"5.5.40"},{"Engine":"mysql","DBEngineVersionDescription":"MySQL 5.5.40a","DBParameterGroupFamily":"mysql5.5","DBEngineDescription":"MySQL Community Edition","EngineVersion":"5.5.40a"},{"Engine":"mysql","DBEngineVersionDescription":"MySQL 5.5.40b","DBParameterGroupFamily":"mysql5.5","DBEngineDescription":"MySQL Community Edition","EngineVersion":"5.5.40b"},{"Engine":"mysql","DBEngineVersionDescription":"MySQL 5.5.41","DBParameterGroupFamily":"mysql5.5","DBEngineDescription":"MySQL Community Edition","EngineVersion":"5.5.41"},{"Engine":"mysql","DBEngineVersionDescription":"MySQL 5.6.19a","DBParameterGroupFamily":"mysql5.6","DBEngineDescription":"MySQL Community Edition","EngineVersion":"5.6.19a"},{"Engine":"mysql","DBEngineVersionDescription":"MySQL 5.6.19b","DBParameterGroupFamily":"mysql5.6","DBEngineDescription":"MySQL Community Edition","EngineVersion":"5.6.19b"},{"Engine":"mysql","DBEngineVersionDescription":"MySQL 5.6.21","DBParameterGroupFamily":"mysql5.6","DBEngineDescription":"MySQL Community Edition","EngineVersion":"5.6.21"},{"Engine":"mysql","DBEngineVersionDescription":"MySQL 5.6.21b","DBParameterGroupFamily":"mysql5.6","DBEngineDescription":"MySQL Community Edition","EngineVersion":"5.6.21b"},{"Engine":"mysql","DBEngineVersionDescription":"MySQL 5.6.22","DBParameterGroupFamily":"mysql5.6","DBEngineDescription":"MySQL Community Edition","EngineVersion":"5.6.22"},
{"Engine":"oracle-ee","DBParameterGroupFamily":"oracle-ee-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Enterprise Edition","EngineVersion":"11.2.0.2.v3","DBEngineVersionDescription":"Oracle 11.2.0.2.v3"},{"Engine":"oracle-ee","DBParameterGroupFamily":"oracle-ee-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Enterprise Edition","EngineVersion":"11.2.0.2.v4","DBEngineVersionDescription":"Oracle 11.2.0.2.v4"},{"Engine":"oracle-ee","DBParameterGroupFamily":"oracle-ee-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Enterprise Edition","EngineVersion":"11.2.0.2.v5","DBEngineVersionDescription":"Oracle 11.2.0.2.v5"},{"Engine":"oracle-ee","DBParameterGroupFamily":"oracle-ee-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Enterprise Edition","EngineVersion":"11.2.0.2.v6","DBEngineVersionDescription":"Oracle 11.2.0.2.v6"},{"Engine":"oracle-ee","DBParameterGroupFamily":"oracle-ee-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Enterprise Edition","EngineVersion":"11.2.0.2.v7","DBEngineVersionDescription":"Oracle 11.2.0.2.v7"},
{"Engine":"oracle-ee","DBParameterGroupFamily":"oracle-ee-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Enterprise Edition","EngineVersion":"11.2.0.3.v1","DBEngineVersionDescription":"Oracle 11.2.0.3.v1"},{"Engine":"oracle-ee","DBParameterGroupFamily":"oracle-ee-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Enterprise Edition","EngineVersion":"11.2.0.3.v2","DBEngineVersionDescription":"Oracle 11.2.0.3.v2"},{"Engine":"oracle-ee","DBParameterGroupFamily":"oracle-ee-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Enterprise Edition","EngineVersion":"11.2.0.4.v1","DBEngineVersionDescription":"Oracle 11.2.0.4.v1"},{"Engine":"oracle-ee","DBParameterGroupFamily":"oracle-ee-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Enterprise Edition","EngineVersion":"11.2.0.4.v3","DBEngineVersionDescription":"Oracle 11.2.0.4.v3"},{"Engine":"oracle-ee","DBParameterGroupFamily":"oracle-ee-12.1","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Enterprise Edition","EngineVersion":"12.1.0.1.v1","DBEngineVersionDescription":"Oracle 12.1.0.1.v1"},{"Engine":"oracle-se","DBParameterGroupFamily":"oracle-se-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition","EngineVersion":"11.2.0.2.v3","DBEngineVersionDescription":"Oracle 11.2.0.2.v3"},
{"Engine":"oracle-se","DBParameterGroupFamily":"oracle-se-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition","EngineVersion":"11.2.0.2.v4","DBEngineVersionDescription":"Oracle 11.2.0.2.v4"},{"Engine":"oracle-se","DBParameterGroupFamily":"oracle-se-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition","EngineVersion":"11.2.0.2.v5","DBEngineVersionDescription":"Oracle 11.2.0.2.v5"},{"Engine":"oracle-se","DBParameterGroupFamily":"oracle-se-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition","EngineVersion":"11.2.0.2.v6","DBEngineVersionDescription":"Oracle 11.2.0.2.v6"},{"Engine":"oracle-se","DBParameterGroupFamily":"oracle-se-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition","EngineVersion":"11.2.0.2.v7","DBEngineVersionDescription":"Oracle 11.2.0.2.v7"},{"Engine":"oracle-se","DBParameterGroupFamily":"oracle-se-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition","EngineVersion":"11.2.0.3.v1","DBEngineVersionDescription":"Oracle 11.2.0.3.v1"},{"Engine":"oracle-se","DBParameterGroupFamily":"oracle-se-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition","EngineVersion":"11.2.0.3.v2","DBEngineVersionDescription":"Oracle 11.2.0.3.v2"},
{"Engine":"oracle-se","DBParameterGroupFamily":"oracle-se-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition","EngineVersion":"11.2.0.4.v1","DBEngineVersionDescription":"Oracle 11.2.0.4.v1"},{"Engine":"oracle-se","DBParameterGroupFamily":"oracle-se-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition","EngineVersion":"11.2.0.4.v3","DBEngineVersionDescription":"Oracle 11.2.0.4.v3"},{"Engine":"oracle-se","DBParameterGroupFamily":"oracle-se-12.1","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition","EngineVersion":"12.1.0.1.v1","DBEngineVersionDescription":"Oracle 12.1.0.1.v1"},{"Engine":"oracle-se1","DBParameterGroupFamily":"oracle-se1-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition One","EngineVersion":"11.2.0.2.v3","DBEngineVersionDescription":"Oracle 11.2.0.2.v3"},{"Engine":"oracle-se1","DBParameterGroupFamily":"oracle-se1-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition One","EngineVersion":"11.2.0.2.v4","DBEngineVersionDescription":"Oracle 11.2.0.2.v4"},{"Engine":"oracle-se1","DBParameterGroupFamily":"oracle-se1-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition One","EngineVersion":"11.2.0.2.v5","DBEngineVersionDescription":"Oracle 11.2.0.2.v5"},
{"Engine":"oracle-se1","DBParameterGroupFamily":"oracle-se1-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition One","EngineVersion":"11.2.0.2.v6","DBEngineVersionDescription":"Oracle 11.2.0.2.v6"},{"Engine":"oracle-se1","DBParameterGroupFamily":"oracle-se1-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition One","EngineVersion":"11.2.0.2.v7","DBEngineVersionDescription":"Oracle 11.2.0.2.v7"},{"Engine":"oracle-se1","DBParameterGroupFamily":"oracle-se1-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition One","EngineVersion":"11.2.0.3.v1","DBEngineVersionDescription":"Oracle 11.2.0.3.v1"},{"Engine":"oracle-se1","DBParameterGroupFamily":"oracle-se1-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition One","EngineVersion":"11.2.0.3.v2","DBEngineVersionDescription":"Oracle 11.2.0.3.v2"},{"Engine":"oracle-se1","DBParameterGroupFamily":"oracle-se1-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition One","EngineVersion":"11.2.0.4.v1","DBEngineVersionDescription":"Oracle 11.2.0.4.v1"},{"Engine":"oracle-se1","DBParameterGroupFamily":"oracle-se1-11.2","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition One","EngineVersion":"11.2.0.4.v3","DBEngineVersionDescription":"Oracle 11.2.0.4.v3"},
{"Engine":"oracle-se1","DBParameterGroupFamily":"oracle-se1-12.1","DefaultCharacterSet":{"CharacterSetName":"AL32UTF8","CharacterSetDescription":"Unicode 5.0 UTF-8 Universal character set"},"DBEngineDescription":"Oracle Database Standard Edition One","EngineVersion":"12.1.0.1.v1","DBEngineVersionDescription":"Oracle 12.1.0.1.v1"},{"Engine":"postgres","DBEngineVersionDescription":"PostgreSQL 9.3.1-R1","DBParameterGroupFamily":"postgres9.3","DBEngineDescription":"PostgreSQL","EngineVersion":"9.3.1"},{"Engine":"postgres","DBEngineVersionDescription":"PostgreSQL 9.3.2-R1","DBParameterGroupFamily":"postgres9.3","DBEngineDescription":"PostgreSQL","EngineVersion":"9.3.2"},{"Engine":"postgres","DBEngineVersionDescription":"PostgreSQL 9.3.3-R1","DBParameterGroupFamily":"postgres9.3","DBEngineDescription":"PostgreSQL","EngineVersion":"9.3.3"},{"Engine":"postgres","DBEngineVersionDescription":"PostgreSQL 9.3.5-R1","DBParameterGroupFamily":"postgres9.3","DBEngineDescription":"PostgreSQL","EngineVersion":"9.3.5"},{"Engine":"postgres","DBEngineVersionDescription":"PostgreSQL 9.3.6-R1","DBParameterGroupFamily":"postgres9.3","DBEngineDescription":"PostgreSQL","EngineVersion":"9.3.6"},{"Engine":"postgres","DBEngineVersionDescription":"PostgreSQL 9.4.1-R1","DBParameterGroupFamily":"postgres9.4","DBEngineDescription":"PostgreSQL","EngineVersion":"9.4.1"},{"Engine":"sqlserver-ee","DBEngineVersionDescription":"SQL Server 2008 R2 10.50.2789.0.v1","DBParameterGroupFamily":"sqlserver-ee-10.5","DBEngineDescription":"Microsoft SQL Server Enterprise Edition","EngineVersion":"10.50.2789.0.v1"},{"Engine":"sqlserver-ee","DBEngineVersionDescription":"SQL Server 2012 11.00.2100.60.v1","DBParameterGroupFamily":"sqlserver-ee-11.0","DBEngineDescription":"Microsoft SQL Server Enterprise Edition","EngineVersion":"11.00.2100.60.v1"},
{"Engine":"sqlserver-ex","DBEngineVersionDescription":"SQL Server 2008 R2 10.50.2789.0.v1","DBParameterGroupFamily":"sqlserver-ex-10.5","DBEngineDescription":"Microsoft SQL Server Express Edition","EngineVersion":"10.50.2789.0.v1"},{"Engine":"sqlserver-ex","DBEngineVersionDescription":"SQL Server 2012 11.00.2100.60.v1","DBParameterGroupFamily":"sqlserver-ex-11.0","DBEngineDescription":"Microsoft SQL Server Express Edition","EngineVersion":"11.00.2100.60.v1"},{"Engine":"sqlserver-se","DBEngineVersionDescription":"SQL Server 2008 R2 10.50.2789.0.v1","DBParameterGroupFamily":"sqlserver-se-10.5","DBEngineDescription":"Microsoft SQL Server Standard Edition","EngineVersion":"10.50.2789.0.v1"},{"Engine":"sqlserver-se","DBEngineVersionDescription":"SQL Server 2012 11.00.2100.60.v1","DBParameterGroupFamily":"sqlserver-se-11.0","DBEngineDescription":"Microsoft SQL Server Standard Edition","EngineVersion":"11.00.2100.60.v1"},{"Engine":"sqlserver-web","DBEngineVersionDescription":"SQL Server 2008 R2 10.50.2789.0.v1","DBParameterGroupFamily":"sqlserver-web-10.5","DBEngineDescription":"Microsoft SQL Server Web Edition","EngineVersion":"10.50.2789.0.v1"},{"Engine":"sqlserver-web","DBEngineVersionDescription":"SQL Server 2012 11.00.2100.60.v1","DBParameterGroupFamily":"sqlserver-web-11.0","DBEngineDescription":"Microsoft SQL Server Web Edition","EngineVersion":"11.00.2100.60.v1"}]}

condition

python


def judge_mysql51(d):
    if isinstance(d, dict):
        return "mysql5.1" in d.values()

output

python


[{"DBEngineVersionDescription": "MySQL 5.1.73a", "DBEngineDescription": "MySQL Community Edition", "Engine": "mysql", "EngineVersion": "5.1.73a",
 "DBParameterGroupFamily": "mysql5.1"}, 
{"DBEngineVersionDescription": "MySQL 5.1.73b", "DBEngineDescription": "MySQL Community Edition", "Engine": "mysql", "EngineVersion": "5.1.73b",
 "DBParameterGroupFamily": "mysql5.1"}]

I was able to get specific data even if the JSON data was large.

at the end

I wrote a more detailed article, so please refer to it.

-[4 steps to recursively retrieve specific data from complex JSON](https://jumpyoshim.hatenablog.com/entry/four-steps-to-get-specific-data-from-complex- json) -The simplest example of understanding recursive functions in Python

Recommended Posts

Extract specific data from complex JSON
Extract data from S3
Try to extract specific data from JSON format data in object storage Cloudian/S3
Extract specific languages from Wiktionary
Python: Reading JSON data from web API
Extract data from a web page with Python
Extract features (features) from sentences.
Extract table from wikipedia
[Python] Extract text data from XML data of 10GB or more.
Follow Blender's data structure and extract vertex coordinates from fbx
Extract classification information etc. from genbank data in xml format
Get structural data from CHEMBLID
Export 3D data from QGIS
Extract Twitter data with CSV
Generate URL query from JSON
Extract redirects from Wikipedia dumps
Extract csv data and calculate
Read json data with python
I made a subtitle file (SRT) from JSON data of AmiVoice
Find all patterns to extract a specific number from the set