With e-Stat, you can use the API to get various statistical data provided by the government in XML or JSON.
After completing the API registration application, you will be able to log in with your "email address" and "password".
https://www.e-stat.go.jp/api/apiuser/php/index.php?action=login
https://www.e-stat.go.jp/api/apiuser/php/index.php?action=login
In the user information change, you can change the data specified at the time of registration.
If the url does not exist, enter "http: // localhost /" etc. You can issue up to 3 appIDs.
This makes the "API Function Test Form" and "Sample" available.
You can check the function of each API on the API function test form. In addition, API specifications can be used from the screen below.
http://www.e-stat.go.jp/api/api-info/api-spec/
There are four types of APIs that can be used.
・ Acquisition of statistical table information Obtain the information of the statistical table provided by the official statistics counter (e-Stat). It is also possible to acquire information with narrowed conditions by specifying request parameters.
・ Acquisition of meta information Acquires meta information (table items, classification items, regional items, etc.) corresponding to the specified statistical table ID.
・ Data acquisition Acquires statistical data (numerical data) corresponding to the specified statistical table ID or dataset ID. It is also possible to acquire information with narrowed conditions by specifying request parameters. This data is available in XML and JSON.
・ Data set registration Register the acquisition conditions for acquiring statistical data. You can omit the acquisition condition by specifying the narrowing condition for acquiring statistical data as "data set".
・ Refer to the data set Refer to the filtering conditions of the registered data set. If the dataset ID is not specified, you can refer to the list of datasets that can be used by the user.
The basic usage is as follows. Search the statistical table with "Get statistical table information", get the ID of the statistical table, get the meta information with "Get meta information", and then get the statistical data with "Get data".
This script gets the statistical table information. Execute by specifying API_KEY, search data type, and search keyword. The search data types are as follows. ・ 1: Statistical information (default value) ・ 2: Subregion / regional mesh ・ 3: Social / demographic system (prefectures / municipalities)
Sample code:
getStatsListSample.py
#!/usr/bin/python
# -*- coding: utf-8 -*-
import urllib
import urllib2
from lxml import etree
import sys
import codecs
#For windows...
sys.stdout = codecs.getwriter('cp932')(sys.stdout)
def main(argvs, argc):
if argc != 4:
print ("Usage #python %s api_key search_kind key_word" % argvs[0])
return 1
api_key = argvs[1]
search_kind = argvs[2]
#For windows...
key_word = argvs[3].decode('cp932')
key_word = urllib.quote(key_word.encode('utf-8'))
url = ('http://api.e-stat.go.jp/rest/1.0/app/getStatsList?appId=%s&lang=J&searchKind=%s&searchWord=%s' % (api_key, search_kind, key_word))
req = urllib2.Request(url)
opener = urllib2.build_opener()
conn = opener.open(req)
cont = conn.read()
parser = etree.XMLParser(recover=True)
root = etree.fromstring(cont, parser)
result = root.find('RESULT')
print ('RESULT==============')
print (result.find('STATUS').text)
print (result.find('ERROR_MSG').text)
print (result.find('DATE').text)
data_list = root.find('DATALIST_INF')
list_infs = data_list.xpath('.//LIST_INF')
for list_inf in list_infs:
print '--------------'
print (u'Statistical table ID:%s' % (list_inf.get('id')))
stat_name = list_inf.find('STAT_NAME')
if stat_name is not None:
print (u'Government statistics name:%s %s' % (stat_name.get('code'), stat_name.text))
gov_org = list_inf.find('GOV_ORG')
if gov_org is not None:
print (u'Name of creator:%s %s' % (gov_org.get('code'), gov_org.text))
statistics_name = list_inf.find('STATISTICS_NAME')
if statistics_name is not None:
print (u'Provided statistical name and provided classification name:%s' % (statistics_name.text))
title = list_inf.find('TITLE')
if title is not None:
print (u'title:%s %s' % (title.get('no'), title.text))
cycle = list_inf.find('CYCLE')
if cycle is not None:
print (u'Offer cycle:%s' % (cycle.text))
survey_date = list_inf.find('SURVEY_DATE')
if survey_date is not None:
print (u'Survey date:%s' % (survey_date.text))
open_date = list_inf.find('OPEN_DATE')
if open_date is not None:
print (u'release date:%s' % (open_date.text))
small_area = list_inf.find('SMALL_AREA')
if small_area is not None:
print (u'Subregion attributes:%s' % (small_area.text))
if __name__ == '__main__':
argvs = sys.argv
argc = len(argvs)
sys.exit(main(argvs, argc))
Example of use:
python getStatsListSample.py API_KEY 1 Employment
Output result:
Statistical table ID:0003059047
Government statistics name:00550100 Ministry of Economy, Trade and Industry Basic Survey on Corporate Activities
Name of creator:00550 Ministry of Economy, Trade and Industry
Provided statistical name and provided classification name:Ministry of Economy, Trade and Industry Basic Survey on Corporate Activities Statistical Table List-Confirmed Report (Data)
2010 Corporate Activity Basic Survey Confirmation Report-2009 Results-
title:1-8 Statistical table (Volume 1) [Table regarding business organization] Table 8: By industry, number of companies, business organization
Number of different employees
Offer cycle:Annual
Survey date:201001-201012
release date:2012-03-31
Subregion attributes:0
The statistical table ID "0003059047" is the ID that can be used for data acquisition.
This script gets the meta information for the specified stats ID. The meta information is acquired by using the statistical table ID as a parameter, which was the search for statistical table information.
Sample code:
getMetaSample.py
#!/usr/bin/python
# -*- coding: utf-8 -*-
import urllib
import urllib2
from lxml import etree
import sys
import codecs
#For windows...
sys.stdout = codecs.getwriter('cp932')(sys.stdout)
def get_meta_data(api_key, stats_data_id):
"""
Get meta information
"""
url = ('http://api.e-stat.go.jp/rest/1.0/app/getMetaInfo?appId=%s&lang=J&statsDataId=%s' % (api_key, stats_data_id))
req = urllib2.Request(url)
opener = urllib2.build_opener()
conn = opener.open(req)
cont = conn.read()
parser = etree.XMLParser(recover=True)
root = etree.fromstring(cont, parser)
class_object_tags = root.xpath('//METADATA_INF/CLASS_INF/CLASS_OBJ')
class_object = {}
for class_object_tag in class_object_tags:
class_object_id = class_object_tag.get('id')
class_object_name = class_object_tag.get('name')
class_object_item = {
'id' : class_object_id,
'name' : class_object_name,
'objects' : {}
}
class_tags = class_object_tag.xpath('.//CLASS')
for class_tag in class_tags:
class_item = {
'code' : class_tag.get('code'),
'name' : class_tag.get('name'),
'level' : class_tag.get('level'),
'unit' : class_tag.get('unit')
}
class_object_item['objects'][class_item['code']] = class_item
class_object[class_object_id] = class_object_item
return class_object
def main(argvs, argc):
if argc != 3:
print ("Usage #python %s api_key stats_id" % argvs[0])
return 1
api_key = argvs[1]
stats_id = argvs[2]
ret = get_meta_data(api_key, stats_id)
for key in ret:
print ('======================')
print (key)
print ('name: %s' % ret[key]['name'])
for obj_code, obj in ret[key]['objects'].items():
print ('----------------------')
print ('code: %s' % obj_code)
print ('name: %s' % obj['name'])
print ('unit: %s' % obj['unit'])
print ('level: %s' % obj['level'])
if __name__ == '__main__':
argvs = sys.argv
argc = len(argvs)
sys.exit(main(argvs, argc))
Example of use:
python getMetaSample.py API_KEY 0003059047
Output example:
======================
cat01
name: 22_1-8 Number of companies, number of employees by business organization
----------------------
code: 0011000
name:Number of regular employees (excluding seconded employees) Headquarters / head office Headquarters functional department Others
unit: None
level: 1
----------------------
code: 0029000
name:Number of regular employees (including seconded employees) Seconded employees to other companies, etc.
unit: None
level: 1
In the meta information, the category used by the relevant statistical table and the values that can be taken by that category are displayed.
This example shows a sample that outputs a statistical table as CSV. If you specify the statistical table ID and CSV output path, the specified statistical table will be output as CSV.
Sample code:
export_csv.py
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import urllib2
from lxml import etree
import csv
def export_statical_data(writer, api_key, stats_data_id, class_object, start_position):
"""
Export stats
"""
url = ('http://api.e-stat.go.jp/rest/1.0/app/getStatsData?limit=10000&appId=%s&lang=J&statsDataId=%s&metaGetFlg=N&cntGetFlg=N' % (api_key, stats_data_id))
if start_position > 0:
url = url + ('&startPosition=%d' % start_position)
req = urllib2.Request(url)
opener = urllib2.build_opener()
conn = opener.open(req)
cont = conn.read()
parser = etree.XMLParser(recover=True)
root = etree.fromstring(cont, parser)
row = []
datas = {}
value_tags = root.xpath('//STATISTICAL_DATA/DATA_INF/VALUE')
for value_tag in value_tags:
row = []
for key in class_object:
val = value_tag.get(key)
if val in class_object[key]['objects']:
level = '';
if 'level' in class_object[key]['objects'][val]:
if class_object[key]['objects'][val]['level'].isdigit():
level = ' ' * (int(class_object[key]['objects'][val]['level']) - 1)
text = ("%s%s" % (level , class_object[key]['objects'][val]['name']))
row.append(text.encode('utf-8'))
else:
row.append(val.encode('utf-8'))
row.append(value_tag.text)
writer.writerow(row)
next_tags = root.xpath('//STATISTICAL_DATA/TABLE_INF/NEXT_KEY')
if next_tags:
if next_tags[0].text:
export_statical_data(writer, api_key, stats_data_id, class_object, int(next_tags[0].text))
def get_meta_data(api_key, stats_data_id):
"""
Get meta information
"""
url = ('http://api.e-stat.go.jp/rest/1.0/app/getMetaInfo?appId=%s&lang=J&statsDataId=%s' % (api_key, stats_data_id))
req = urllib2.Request(url)
opener = urllib2.build_opener()
conn = opener.open(req)
cont = conn.read()
parser = etree.XMLParser(recover=True)
root = etree.fromstring(cont, parser)
class_object_tags = root.xpath('//METADATA_INF/CLASS_INF/CLASS_OBJ')
class_object = {}
for class_object_tag in class_object_tags:
class_object_id = class_object_tag.get('id')
class_object_name = class_object_tag.get('name')
class_object_item = {
'id' : class_object_id,
'name' : class_object_name,
'objects' : {}
}
class_tags = class_object_tag.xpath('.//CLASS')
for class_tag in class_tags:
class_item = {
'code' : class_tag.get('code'),
'name' : class_tag.get('name'),
'level' : class_tag.get('level'),
'unit' : class_tag.get('unit')
}
class_object_item['objects'][class_item['code']] = class_item
class_object[class_object_id] = class_object_item
return class_object
def export_csv(api_key, stats_data_id, output_path):
"""
Export specified statistics to CSV.
"""
writer = csv.writer(open(output_path, 'wb'),quoting=csv.QUOTE_ALL)
class_object = get_meta_data(api_key, stats_data_id)
row = []
for key in class_object:
title = class_object[key]['name']
row.append(title.encode('utf-8'))
row.append('VALUE')
writer.writerow(row)
export_statical_data(writer, api_key, stats_data_id, class_object, 1)
def main(argvs, argc):
if argc != 4:
print ("Usage #python %s api_key stats_data_id output_path" % argvs[0])
return 1
api_key = argvs[1]
stats_data_id = argvs[2]
output_path = argvs[3]
export_csv(api_key, stats_data_id, output_path)
if __name__ == '__main__':
argvs = sys.argv
argc = len(argvs)
sys.exit(main(argvs, argc))
Example of use:
python export_csv.py API_KEY 0003059047 output.csv
Output example:
"22_1-8 Number of companies, number of employees by business organization","22_1-8 industry","VALUE"
"Number of companies","2005","27677"
"Number of companies","2006","27917"
"Number of companies","2007","29080"
"Number of companies","2008","29355"
"Number of companies","2009","29096"
"Number of companies","Total total","29096"
"Number of companies","total","27871"
"Number of companies","Mining, quarrying, gravel pitting","36"
"Number of companies","Manufacturing industry","13105"
"Number of companies","090 Food manufacturing industry","1498"
"Number of companies","091 Livestock food manufacturing industry","285"
"Number of companies","092 Fisheries food manufacturing industry","222"
"Number of companies","093 Grain and milling industry","37"
Regional mesh statistics divide the area into mesh areas without gaps based on latitude and longitude, and organize the statistical data into each area. The figure below is shown.
Sample code:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import urllib
import urllib2
from lxml import etree
import csv
from collections import defaultdict
import json
from matplotlib import pyplot
import numpy as np
from math import *
def draw_heatmap(data):
#draw
fig, ax = pyplot.subplots()
heatmap = ax.pcolor(data, cmap=pyplot.cm.Blues)
ax.set_xticks(np.arange(data.shape[0]) + 0.5, minor=False)
ax.set_yticks(np.arange(data.shape[1]) + 0.5, minor=False)
ax.invert_yaxis()
ax.xaxis.tick_top()
pyplot.savefig('image.png')
pyplot.show()
return heatmap
def get_meta_data(api_key, stats_data_id):
"""
Get meta information
"""
url = ('http://api.e-stat.go.jp/rest/1.0/app/getMetaInfo?appId=%s&lang=J&statsDataId=%s' % (api_key, stats_data_id))
req = urllib2.Request(url)
opener = urllib2.build_opener()
conn = opener.open(req)
cont = conn.read()
parser = etree.XMLParser(recover=True)
root = etree.fromstring(cont, parser)
class_object_tags = root.xpath('//METADATA_INF/CLASS_INF/CLASS_OBJ')
class_object = {}
for class_object_tag in class_object_tags:
class_object_id = class_object_tag.get('id')
class_object_name = class_object_tag.get('name')
class_object_item = {
'id' : class_object_id,
'name' : class_object_name,
'objects' : {}
}
class_tags = class_object_tag.xpath('.//CLASS')
for class_tag in class_tags:
class_item = {
'code' : class_tag.get('code'),
'name' : class_tag.get('name'),
'level' : class_tag.get('level'),
'unit' : class_tag.get('unit')
}
class_object_item['objects'][class_item['code']] = class_item
class_object[class_object_id] = class_object_item
return class_object
def get_stats_list(api_key, search_kind, key_word):
key_word = urllib.quote(key_word.encode('utf-8'))
url = ('http://api.e-stat.go.jp/rest/1.0/app/getStatsList?appId=%s&lang=J&searchKind=%s&searchWord=%s' % (api_key, search_kind, key_word))
req = urllib2.Request(url)
opener = urllib2.build_opener()
conn = opener.open(req)
cont = conn.read()
parser = etree.XMLParser(recover=True)
root = etree.fromstring(cont, parser)
ret = []
data_list = root.find('DATALIST_INF')
list_infs = data_list.xpath('.//LIST_INF')
for list_inf in list_infs:
item = {
'id': list_inf.get('id')
}
stat_name = list_inf.find('STAT_NAME')
if stat_name is not None:
item['stat_name'] = stat_name.text
item['stat_name_code'] = stat_name.get('code')
gov_org = list_inf.find('GOV_ORG')
if gov_org is not None:
item['gov_org'] = gov_org.text
item['gov_org_code'] = gov_org.get('code')
statistics_name = list_inf.find('STATISTICS_NAME')
if statistics_name is not None:
item['statistics_name'] = statistics_name.text
title = list_inf.find('TITLE')
if title is not None:
item['title'] = title.text
cycle = list_inf.find('CYCLE')
if cycle is not None:
item['cycle'] = cycle.text
survey_date = list_inf.find('SURVEY_DATE')
if survey_date is not None:
item['survey_date'] = survey_date.text
open_date = list_inf.find('OPEN_DATE')
if open_date is not None:
item['open_date'] = open_date.text
small_area = list_inf.find('SMALL_AREA')
if small_area is not None:
item['small_area'] = small_area.text
ret.append(item)
return ret
def _get_stats_id_value(api_key, stats_data_id, class_object, start_position, filter_str):
"""
Get statistics
"""
url = ('http://api.e-stat.go.jp/rest/1.0/app/getStatsData?limit=10000&appId=%s&lang=J&statsDataId=%s&metaGetFlg=N&cntGetFlg=N%s' % (api_key, stats_data_id, filter_str))
if start_position > 0:
url = url + ('&startPosition=%d' % start_position)
req = urllib2.Request(url)
opener = urllib2.build_opener()
conn = opener.open(req)
cont = conn.read()
parser = etree.XMLParser(recover=True)
root = etree.fromstring(cont, parser)
ret = []
row = {}
datas = {}
value_tags = root.xpath('//STATISTICAL_DATA/DATA_INF/VALUE')
for value_tag in value_tags:
row = {}
for key in class_object:
val = value_tag.get(key)
if val in class_object[key]['objects']:
text = class_object[key]['objects'][val]['name']
row[key] = text.encode('utf-8')
else:
row[key] = val.encode('utf-8')
row['value'] = value_tag.text
ret.append(row)
return ret
def get_stats_id_value(api_key, stats_data_id, filter_str):
class_object = get_meta_data(api_key, stats_data_id)
return _get_stats_id_value(api_key, stats_data_id, class_object, 1, filter_str), class_object
def get_stats_id_list_value(api_key, stats_data_ids, filter):
filter_str = ''
for key in filter:
filter_str += ('&%s=%s' % (key, urllib.quote(filter[key].encode('utf-8'))))
ret = []
i = 0
for stats_data_id in stats_data_ids:
list, class_object = get_stats_id_value(api_key, stats_data_id, filter_str)
ret.extend(list)
i = i + 1
if i > 5:
break
return ret
def get_mesh_id(mesh_id, kind):
if kind == 1:
return mesh_id[0:4] + '0000'
elif kind == 2:
return mesh_id[0:6] + '00'
else:
raise Exception(mesh_id)
def collect_mesh_value(api_key, stats_data_ids, filter, kind):
filter_str = ''
for key in filter:
filter_str += ('&%s=%s' % (key, urllib.quote(filter[key].encode('utf-8'))))
ret = defaultdict(float)
i = 0
for stats_data_id in stats_data_ids:
list, class_object = get_stats_id_value(api_key, stats_data_id, filter_str)
sum = 0
for row in list:
key = get_mesh_id(row['area'], kind)
v = row['value']
if v.isdigit():
ret[key] += float(v)
i = i + 1
#if i > 5:
# break
return ret
def parse_mesh_to_num(mesh_id):
ret = {}
if len(mesh_id) == 4:
ret['p'] = float(mesh_id[0:2])
ret['u'] = float(mesh_id[2:4])
ret['q'] = 0.0
ret['v'] = 0.0
ret['r'] = 0.0
ret['w'] = 0.0
return ret
elif len(mesh_id) == 8:
ret['p'] = float(mesh_id[0:2])
ret['u'] = float(mesh_id[2:4])
ret['q'] = float(mesh_id[4])
ret['v'] = float(mesh_id[5])
ret['r'] = float(mesh_id[6])
ret['w'] = float(mesh_id[7])
return ret
else:
raise Exception(mesh_id)
def convert_mesh_to_num(mesh_id):
d1 = parse_mesh_to_num(mesh_id)
#The secondary area is 0-7, so multiply by 80
x1 = (d1['u'] * 80) + (d1['v'] * 10) + d1['w'];
y1 = (d1['p'] * 80) + (d1['q'] * 10) + d1['r'];
return x1, y1
def main(argvs, argc):
wd = u'2010 Census-World Geodetic System(1KM mesh)20101001'
# API_KEY
api_key = 'API_KEY'
search_kind = '2'
stats_list = get_stats_list(api_key, search_kind, wd)
stats_ids = []
for stats in stats_list:
stats_ids.append(stats['id'])
#Filtering by total population
values = collect_mesh_value(api_key, stats_ids, {'cdCat01':'T000608001'}, 2)
ret = []
max_x = 0
min_x = 9999
max_y = 0
min_y = 9999
for key in values.keys():
x, y = convert_mesh_to_num(key)
x = x
y = y
if min_x > x:
min_x = x
if max_x < x:
max_x = x
if min_y > y:
min_y = y
if max_y < y:
max_y = y
size_x = int(max_x - min_x) / 10 + 1
size_y = int(max_y - min_y) / 10 + 1
buff = [[0.0 for i in range(size_x)] for j in range(size_y)]
for key in values.keys():
x, y = convert_mesh_to_num(key)
x = int(x - min_x) / 10
y = (size_y-1) - int(y - min_y) / 10
#If you do not take a log, the difference between Tokyo and other regions will be so great that you will not be able to map Japan.
buff[y][x] = log10(float(values[key]))
#print ('%s\t%s %d %d' % (key,values[key],x,y))
draw_heatmap(np.array(buff))
if __name__ == '__main__':
argvs = sys.argv
argc = len(argvs)
sys.exit(main(argvs, argc))
Description: In this figure, the population is aggregated for each second area and its common logarithm is displayed as a heat map.
The common logarithm is used because the population difference between the Kanto area and other areas is too large to make a decent map.
Attempting to display the third area (standard mesh) nationwide consumes a huge amount of memory. The map is too rough in the first area.
In this example, it takes an enormous amount of time to display everything. In the next article, we will try to improve the efficiency of processing by temporarily saving the data in spatialite.
** How to display the regional mesh of the official statistics window (eStat) in a web browser ** http://qiita.com/mima_ita/items/38784095a146c87dcd23
Characteristics and history of regional mesh statistics: http://www.stat.go.jp/data/mesh/pdf/gaiyo1.pdf
Heatmap with Python + matplotlib http://qiita.com/ynakayama/items/7dc01f45caf6d87a981b
Recommended Posts