[PYTHON] Let's use the API of the official statistics counter (e-Stat)

With e-Stat, you can use the API to get various statistical data provided by the government in XML or JSON.

API usage registration and operation test

1. Apply for API usage from the URL below. http://www.e-stat.go.jp/api/regist-login/
After completing the API registration application, you will be able to log in with your "email address" and "password".

https://www.e-stat.go.jp/api/apiuser/php/index.php?action=login

1. After logging in, if you go to the login screen again, the screen where you can "change / delete user information" and "acquire application ID" will be displayed.

https://www.e-stat.go.jp/api/apiuser/php/index.php?action=login

In the user information change, you can change the data specified at the time of registration.

Issue an appID. Enter the name and URL on the "Get Application ID" screen and press the "Publish" button to get the app ID.

If the url does not exist, enter "http: // localhost /" etc. You can issue up to 3 appIDs.

Test each API from the development support screen. Select "Development Support Information" from "Function Overview".

This makes the "API Function Test Form" and "Sample" available.

API overview

You can check the function of each API on the API function test form. In addition, API specifications can be used from the screen below.

http://www.e-stat.go.jp/api/api-info/api-spec/

There are four types of APIs that can be used.

・ Acquisition of statistical table information Obtain the information of the statistical table provided by the official statistics counter (e-Stat). It is also possible to acquire information with narrowed conditions by specifying request parameters.

・ Acquisition of meta information Acquires meta information (table items, classification items, regional items, etc.) corresponding to the specified statistical table ID.

・ Data acquisition Acquires statistical data (numerical data) corresponding to the specified statistical table ID or dataset ID. It is also possible to acquire information with narrowed conditions by specifying request parameters. This data is available in XML and JSON.

・ Data set registration Register the acquisition conditions for acquiring statistical data. You can omit the acquisition condition by specifying the narrowing condition for acquiring statistical data as "data set".

・ Refer to the data set Refer to the filtering conditions of the registered data set. If the dataset ID is not specified, you can refer to the list of datasets that can be used by the user.

The basic usage is as follows. Search the statistical table with "Get statistical table information", get the ID of the statistical table, get the meta information with "Get meta information", and then get the statistical data with "Get data".

sample

Search for statistical table information

This script gets the statistical table information. Execute by specifying API_KEY, search data type, and search keyword. The search data types are as follows. ・ 1: Statistical information (default value) ・ 2: Subregion / regional mesh ・ 3: Social / demographic system (prefectures / municipalities)

Sample code:

`getStatsListSample.py`


#!/usr/bin/python
# -*- coding: utf-8 -*-
import urllib
import urllib2
from lxml import etree
import sys
import codecs
#For windows...
sys.stdout = codecs.getwriter('cp932')(sys.stdout)

def main(argvs, argc):
    if argc != 4:
        print ("Usage #python %s api_key search_kind key_word" % argvs[0])
        return 1
    api_key = argvs[1]
    search_kind = argvs[2]
    #For windows...
    key_word = argvs[3].decode('cp932')
    key_word = urllib.quote(key_word.encode('utf-8'))

    url = ('http://api.e-stat.go.jp/rest/1.0/app/getStatsList?appId=%s&lang=J&searchKind=%s&searchWord=%s' % (api_key, search_kind, key_word))
    req = urllib2.Request(url)
    opener = urllib2.build_opener()
    conn = opener.open(req)
    cont = conn.read()
    parser = etree.XMLParser(recover=True)
    root = etree.fromstring(cont, parser)
    result = root.find('RESULT')
    print ('RESULT==============')
    print (result.find('STATUS').text)
    print (result.find('ERROR_MSG').text)
    print (result.find('DATE').text)
    data_list = root.find('DATALIST_INF')
    list_infs = data_list.xpath('.//LIST_INF')
    for list_inf in list_infs:
        print '--------------'
        print (u'Statistical table ID:%s' % (list_inf.get('id')))

        stat_name = list_inf.find('STAT_NAME')
        if stat_name is not None:
            print (u'Government statistics name:%s %s' % (stat_name.get('code'), stat_name.text))

        gov_org = list_inf.find('GOV_ORG')
        if gov_org is not None:
            print (u'Name of creator:%s %s' % (gov_org.get('code'), gov_org.text))

        statistics_name = list_inf.find('STATISTICS_NAME')
        if statistics_name is not None:
            print (u'Provided statistical name and provided classification name:%s' % (statistics_name.text))

        title = list_inf.find('TITLE')
        if title is not None:
            print (u'title:%s %s' % (title.get('no'), title.text))

        cycle = list_inf.find('CYCLE')
        if cycle is not None:
            print (u'Offer cycle:%s' % (cycle.text))

        survey_date = list_inf.find('SURVEY_DATE')
        if survey_date is not None:
            print (u'Survey date:%s' % (survey_date.text))

        open_date = list_inf.find('OPEN_DATE')
        if open_date is not None:
            print (u'release date:%s' % (open_date.text))

        small_area = list_inf.find('SMALL_AREA')
        if small_area is not None:
            print (u'Subregion attributes:%s' % (small_area.text))


if __name__ == '__main__':
    argvs = sys.argv
    argc = len(argvs)
    sys.exit(main(argvs, argc))

Example of use:

python getStatsListSample.py API_KEY 1 Employment

Output result:

Statistical table ID:0003059047
Government statistics name:00550100 Ministry of Economy, Trade and Industry Basic Survey on Corporate Activities
Name of creator:00550 Ministry of Economy, Trade and Industry
Provided statistical name and provided classification name:Ministry of Economy, Trade and Industry Basic Survey on Corporate Activities Statistical Table List-Confirmed Report (Data)
2010 Corporate Activity Basic Survey Confirmation Report-2009 Results-
title:1-8 Statistical table (Volume 1) [Table regarding business organization] Table 8: By industry, number of companies, business organization
Number of different employees
Offer cycle:Annual
Survey date:201001-201012
release date:2012-03-31
Subregion attributes:0

The statistical table ID "0003059047" is the ID that can be used for data acquisition.

Display of meta information

This script gets the meta information for the specified stats ID. The meta information is acquired by using the statistical table ID as a parameter, which was the search for statistical table information.

Sample code:

`getMetaSample.py`


#!/usr/bin/python
# -*- coding: utf-8 -*-
import urllib
import urllib2
from lxml import etree
import sys
import codecs
#For windows...
sys.stdout = codecs.getwriter('cp932')(sys.stdout)


def get_meta_data(api_key, stats_data_id):
    """
Get meta information
    """
    url = ('http://api.e-stat.go.jp/rest/1.0/app/getMetaInfo?appId=%s&lang=J&statsDataId=%s' % (api_key, stats_data_id))
    req = urllib2.Request(url)
    opener = urllib2.build_opener()
    conn = opener.open(req)
    cont = conn.read()
    parser = etree.XMLParser(recover=True)
    root = etree.fromstring(cont, parser)
    class_object_tags = root.xpath('//METADATA_INF/CLASS_INF/CLASS_OBJ')
    class_object = {}

    for class_object_tag in class_object_tags:
        class_object_id = class_object_tag.get('id')
        class_object_name = class_object_tag.get('name')
        class_object_item = {
            'id' : class_object_id,
            'name' : class_object_name,
            'objects' : {}
        }
        class_tags = class_object_tag.xpath('.//CLASS')
        for class_tag in class_tags:
            class_item = {
                'code' : class_tag.get('code'),
                'name' : class_tag.get('name'),
                'level' : class_tag.get('level'),
                'unit' : class_tag.get('unit')
            }
            class_object_item['objects'][class_item['code']] = class_item
        class_object[class_object_id] = class_object_item
    return class_object

def main(argvs, argc):
    if argc != 3:
        print ("Usage #python %s api_key stats_id" % argvs[0])
        return 1
    api_key = argvs[1]
    stats_id = argvs[2]
    ret = get_meta_data(api_key, stats_id)
    for key in ret:
        print ('======================')
        print (key)
        print ('name: %s' % ret[key]['name'])
        for obj_code, obj in ret[key]['objects'].items():
            print ('----------------------')
            print ('code: %s' % obj_code)
            print ('name: %s' % obj['name'])
            print ('unit: %s' % obj['unit'])
            print ('level: %s' % obj['level'])

if __name__ == '__main__':
    argvs = sys.argv
    argc = len(argvs)
    sys.exit(main(argvs, argc))

Example of use:

python getMetaSample.py API_KEY 0003059047

Output example:

======================
cat01
name: 22_1-8 Number of companies, number of employees by business organization
----------------------
code: 0011000
name:Number of regular employees (excluding seconded employees) Headquarters / head office Headquarters functional department Others
unit: None
level: 1
----------------------
code: 0029000
name:Number of regular employees (including seconded employees) Seconded employees to other companies, etc.
unit: None
level: 1

In the meta information, the category used by the relevant statistical table and the values that can be taken by that category are displayed.

Output statistical table as CSV

This example shows a sample that outputs a statistical table as CSV. If you specify the statistical table ID and CSV output path, the specified statistical table will be output as CSV.

Sample code:

`export_csv.py`


#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import urllib2
from lxml import etree
import csv

def export_statical_data(writer, api_key, stats_data_id, class_object, start_position):
    """
Export stats
    """
    url = ('http://api.e-stat.go.jp/rest/1.0/app/getStatsData?limit=10000&appId=%s&lang=J&statsDataId=%s&metaGetFlg=N&cntGetFlg=N' % (api_key, stats_data_id))
    if start_position > 0:
        url = url + ('&startPosition=%d' % start_position)

    req = urllib2.Request(url)
    opener = urllib2.build_opener()
    conn = opener.open(req)
    cont = conn.read()
    parser = etree.XMLParser(recover=True)
    root = etree.fromstring(cont, parser)

    row = []
    datas = {}
    value_tags = root.xpath('//STATISTICAL_DATA/DATA_INF/VALUE')
    for value_tag in value_tags:
        row = []
        for key in class_object:
            val = value_tag.get(key)
            if val in class_object[key]['objects']:
                level = '';
                if 'level' in class_object[key]['objects'][val]:
                    if class_object[key]['objects'][val]['level'].isdigit():
                        level = ' ' * (int(class_object[key]['objects'][val]['level']) - 1)
                text = ("%s%s" % (level , class_object[key]['objects'][val]['name']))
                row.append(text.encode('utf-8'))
            else:
                row.append(val.encode('utf-8'))
        row.append(value_tag.text)
        writer.writerow(row)
    
    next_tags = root.xpath('//STATISTICAL_DATA/TABLE_INF/NEXT_KEY')
    if next_tags:
        if next_tags[0].text:
            export_statical_data(writer, api_key, stats_data_id, class_object, int(next_tags[0].text))

def get_meta_data(api_key, stats_data_id):
    """
Get meta information
    """
    url = ('http://api.e-stat.go.jp/rest/1.0/app/getMetaInfo?appId=%s&lang=J&statsDataId=%s' % (api_key, stats_data_id))
    req = urllib2.Request(url)
    opener = urllib2.build_opener()
    conn = opener.open(req)
    cont = conn.read()
    parser = etree.XMLParser(recover=True)
    root = etree.fromstring(cont, parser)
    class_object_tags = root.xpath('//METADATA_INF/CLASS_INF/CLASS_OBJ')
    class_object = {}

    for class_object_tag in class_object_tags:
        class_object_id = class_object_tag.get('id')
        class_object_name = class_object_tag.get('name')
        class_object_item = {
            'id' : class_object_id,
            'name' : class_object_name,
            'objects' : {}
        }
        class_tags = class_object_tag.xpath('.//CLASS')
        for class_tag in class_tags:
            class_item = {
                'code' : class_tag.get('code'),
                'name' : class_tag.get('name'),
                'level' : class_tag.get('level'),
                'unit' : class_tag.get('unit')
            }
            class_object_item['objects'][class_item['code']] = class_item
        class_object[class_object_id] = class_object_item
    return class_object

def export_csv(api_key, stats_data_id, output_path):
    """
Export specified statistics to CSV.
    """
    writer = csv.writer(open(output_path, 'wb'),quoting=csv.QUOTE_ALL)

    class_object = get_meta_data(api_key, stats_data_id)
    row = []
    for key in class_object:
        title = class_object[key]['name']
        row.append(title.encode('utf-8'))
    row.append('VALUE')
    writer.writerow(row)

    export_statical_data(writer, api_key, stats_data_id, class_object, 1)

def main(argvs, argc):
    if argc != 4:
        print ("Usage #python %s api_key stats_data_id output_path" % argvs[0])
        return 1
    api_key = argvs[1]
    stats_data_id = argvs[2]
    output_path = argvs[3]
    export_csv(api_key, stats_data_id, output_path)

if __name__ == '__main__':
    argvs = sys.argv
    argc = len(argvs)
    sys.exit(main(argvs, argc))

Example of use:

python export_csv.py API_KEY 0003059047 output.csv

Output example:

"22_1-8 Number of companies, number of employees by business organization","22_1-8 industry","VALUE"
"Number of companies","2005","27677"
"Number of companies","2006","27917"
"Number of companies","2007","29080"
"Number of companies","2008","29355"
"Number of companies","2009","29096"
"Number of companies","Total total","29096"
"Number of companies","total","27871"
"Number of companies","Mining, quarrying, gravel pitting","36"
"Number of companies","Manufacturing industry","13105"
"Number of companies","090 Food manufacturing industry","1498"
"Number of companies","091 Livestock food manufacturing industry","285"
"Number of companies","092 Fisheries food manufacturing industry","222"
"Number of companies","093 Grain and milling industry","37"

Regional mesh statistics of the population of the 2010 census

Regional mesh statistics divide the area into mesh areas without gaps based on latitude and longitude, and organize the statistical data into each area. The figure below is shown.

Sample code:

#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import urllib
import urllib2
from lxml import etree
import csv
from collections import defaultdict
import json
from matplotlib import pyplot
import numpy as np
from math import *


def draw_heatmap(data):
    #draw
    fig, ax = pyplot.subplots()
    heatmap = ax.pcolor(data, cmap=pyplot.cm.Blues)

    ax.set_xticks(np.arange(data.shape[0]) + 0.5, minor=False)
    ax.set_yticks(np.arange(data.shape[1]) + 0.5, minor=False)

    ax.invert_yaxis()
    ax.xaxis.tick_top()

    pyplot.savefig('image.png')
    pyplot.show()

    return heatmap


def get_meta_data(api_key, stats_data_id):
    """
Get meta information
    """
    url = ('http://api.e-stat.go.jp/rest/1.0/app/getMetaInfo?appId=%s&lang=J&statsDataId=%s' % (api_key, stats_data_id))
    req = urllib2.Request(url)
    opener = urllib2.build_opener()
    conn = opener.open(req)
    cont = conn.read()
    parser = etree.XMLParser(recover=True)
    root = etree.fromstring(cont, parser)
    class_object_tags = root.xpath('//METADATA_INF/CLASS_INF/CLASS_OBJ')
    class_object = {}

    for class_object_tag in class_object_tags:
        class_object_id = class_object_tag.get('id')
        class_object_name = class_object_tag.get('name')
        class_object_item = {
            'id' : class_object_id,
            'name' : class_object_name,
            'objects' : {}
        }
        class_tags = class_object_tag.xpath('.//CLASS')
        for class_tag in class_tags:
            class_item = {
                'code' : class_tag.get('code'),
                'name' : class_tag.get('name'),
                'level' : class_tag.get('level'),
                'unit' : class_tag.get('unit')
            }
            class_object_item['objects'][class_item['code']] = class_item
        class_object[class_object_id] = class_object_item
    return class_object

def get_stats_list(api_key, search_kind, key_word):
    key_word = urllib.quote(key_word.encode('utf-8'))
    url = ('http://api.e-stat.go.jp/rest/1.0/app/getStatsList?appId=%s&lang=J&searchKind=%s&searchWord=%s' % (api_key, search_kind, key_word))
    req = urllib2.Request(url)
    opener = urllib2.build_opener()
    conn = opener.open(req)
    cont = conn.read()
    parser = etree.XMLParser(recover=True)
    root = etree.fromstring(cont, parser)
    ret = []
    data_list = root.find('DATALIST_INF')
    list_infs = data_list.xpath('.//LIST_INF')
    for list_inf in list_infs:
        item = {
             'id': list_inf.get('id')
        }
        stat_name = list_inf.find('STAT_NAME')
        if stat_name is not None:
            item['stat_name'] = stat_name.text
            item['stat_name_code'] = stat_name.get('code')

        gov_org = list_inf.find('GOV_ORG')
        if gov_org is not None:
            item['gov_org'] = gov_org.text
            item['gov_org_code'] = gov_org.get('code')

        statistics_name = list_inf.find('STATISTICS_NAME')
        if statistics_name is not None:
            item['statistics_name'] = statistics_name.text

        title = list_inf.find('TITLE')
        if title is not None:
            item['title'] = title.text

        cycle = list_inf.find('CYCLE')
        if cycle is not None:
            item['cycle'] = cycle.text

        survey_date = list_inf.find('SURVEY_DATE')
        if survey_date is not None:
            item['survey_date'] = survey_date.text

        open_date = list_inf.find('OPEN_DATE')
        if open_date is not None:
            item['open_date'] = open_date.text

        small_area = list_inf.find('SMALL_AREA')
        if small_area is not None:
            item['small_area'] = small_area.text

        ret.append(item)
    return ret


def _get_stats_id_value(api_key, stats_data_id, class_object, start_position, filter_str):
    """
Get statistics
    """
    url = ('http://api.e-stat.go.jp/rest/1.0/app/getStatsData?limit=10000&appId=%s&lang=J&statsDataId=%s&metaGetFlg=N&cntGetFlg=N%s' % (api_key, stats_data_id, filter_str))
    if start_position > 0:
        url = url + ('&startPosition=%d' % start_position)
    req = urllib2.Request(url)
    opener = urllib2.build_opener()
    conn = opener.open(req)
    cont = conn.read()
    parser = etree.XMLParser(recover=True)
    root = etree.fromstring(cont, parser)
    ret = []
    row = {}
    datas = {}
    value_tags = root.xpath('//STATISTICAL_DATA/DATA_INF/VALUE')
    for value_tag in value_tags:
        row = {}
        for key in class_object:
            val = value_tag.get(key)
            if val in class_object[key]['objects']:
                text = class_object[key]['objects'][val]['name']
                row[key] = text.encode('utf-8')
            else:
                row[key] = val.encode('utf-8')
        row['value'] = value_tag.text
        ret.append(row)
    return ret

def get_stats_id_value(api_key, stats_data_id, filter_str):
    class_object = get_meta_data(api_key, stats_data_id)
    return _get_stats_id_value(api_key, stats_data_id, class_object, 1, filter_str), class_object

def get_stats_id_list_value(api_key, stats_data_ids, filter):
    filter_str = ''
    for key in filter:
        filter_str += ('&%s=%s' % (key, urllib.quote(filter[key].encode('utf-8'))))
    ret = []
    i = 0
    for stats_data_id in stats_data_ids:
        list, class_object = get_stats_id_value(api_key, stats_data_id, filter_str)
        ret.extend(list)
        i = i + 1
        if i > 5:
            break
    return ret


def get_mesh_id(mesh_id, kind):
    if kind == 1:
        return mesh_id[0:4] + '0000'
    elif kind == 2:
        return mesh_id[0:6] + '00'
    else:
        raise Exception(mesh_id)


def collect_mesh_value(api_key, stats_data_ids, filter, kind):
    filter_str = ''
    for key in filter:
        filter_str += ('&%s=%s' % (key, urllib.quote(filter[key].encode('utf-8'))))
    ret = defaultdict(float)
    i = 0
    for stats_data_id in stats_data_ids:
        list, class_object = get_stats_id_value(api_key, stats_data_id, filter_str)
        sum = 0
        for row in list:
            key = get_mesh_id(row['area'], kind)
            v = row['value']
            if v.isdigit():
                ret[key] += float(v)
        i = i + 1
        #if i > 5:
        #    break
    return ret


def parse_mesh_to_num(mesh_id):
    ret = {}
    if len(mesh_id) == 4:
        ret['p'] = float(mesh_id[0:2])
        ret['u'] = float(mesh_id[2:4])
        ret['q'] = 0.0
        ret['v'] = 0.0
        ret['r'] = 0.0
        ret['w'] = 0.0
        return ret
    elif len(mesh_id) == 8:
        ret['p'] = float(mesh_id[0:2])
        ret['u'] = float(mesh_id[2:4])
        ret['q'] = float(mesh_id[4])
        ret['v'] = float(mesh_id[5])
        ret['r'] = float(mesh_id[6])
        ret['w'] = float(mesh_id[7])
        return ret
    else:
        raise Exception(mesh_id)

def convert_mesh_to_num(mesh_id):
    d1 = parse_mesh_to_num(mesh_id)
    #The secondary area is 0-7, so multiply by 80
    x1 = (d1['u'] * 80) + (d1['v'] * 10) + d1['w'];
    y1 = (d1['p'] * 80) + (d1['q'] * 10) + d1['r'];
    return x1, y1

def main(argvs, argc):
    wd = u'2010 Census-World Geodetic System(1KM mesh)20101001'
    # API_KEY
    api_key = 'API_KEY'
    search_kind = '2'
    stats_list = get_stats_list(api_key, search_kind, wd)
    stats_ids = []
    for stats in stats_list:
        stats_ids.append(stats['id'])
    #Filtering by total population
    values = collect_mesh_value(api_key, stats_ids, {'cdCat01':'T000608001'}, 2)
    ret = []
    max_x = 0
    min_x = 9999
    max_y = 0
    min_y = 9999
    for key in values.keys():
        x, y = convert_mesh_to_num(key)
        x = x
        y = y
        if min_x > x:
            min_x = x
        if max_x < x:
            max_x = x
        if min_y > y:
            min_y = y
        if max_y < y:
            max_y = y
    size_x = int(max_x - min_x) / 10 + 1
    size_y = int(max_y - min_y) / 10 + 1
    buff = [[0.0 for i in range(size_x)] for j in range(size_y)] 
    for key in values.keys():
        x, y = convert_mesh_to_num(key)
        x = int(x - min_x) / 10
        y = (size_y-1) - int(y - min_y) / 10
        #If you do not take a log, the difference between Tokyo and other regions will be so great that you will not be able to map Japan.
        buff[y][x] = log10(float(values[key]))
        #print ('%s\t%s %d %d' % (key,values[key],x,y))
    draw_heatmap(np.array(buff))

if __name__ == '__main__':
    argvs = sys.argv
    argc = len(argvs)
    sys.exit(main(argvs, argc))

Description: In this figure, the population is aggregated for each second area and its common logarithm is displayed as a heat map.

The common logarithm is used because the population difference between the Kanto area and other areas is too large to make a decent map.

Attempting to display the third area (standard mesh) nationwide consumes a huge amount of memory. The map is too rough in the first area.

In this example, it takes an enormous amount of time to display everything. In the next article, we will try to improve the efficiency of processing by temporarily saving the data in spatialite.

** How to display the regional mesh of the official statistics window (eStat) in a web browser ** http://qiita.com/mima_ita/items/38784095a146c87dcd23

reference

Characteristics and history of regional mesh statistics: http://www.stat.go.jp/data/mesh/pdf/gaiyo1.pdf

Heatmap with Python + matplotlib http://qiita.com/ynakayama/items/7dc01f45caf6d87a981b