Try to get statistics using e-Stat

Introduction

Access to government statistics from today (October 31, 2014) It seems that the Web API has been released. http://www.e-stat.go.jp/api/

A list of available data can be found at the URL below. There is a lot from the census to the labor statistics. http://www.e-stat.go.jp/api/api-info/api-data/

It looks interesting! The record I used for that.

Registration

  1. First, access this page and register as a user. http://www.e-stat.go.jp/api/regist-login/ Enter your email address and name.

  2. A notification has arrived at your email address, so click it to activate it.

  3. Next, log in. Get the application ID. It seems to be up to 3 IDs per person. From now on, the application ID will be xxx.

Data acquisition and plotting

The procedure is as follows.

  1. Access getStatsList and retrieve the appropriate ID.
  2. Access getStatsData to retrieve the data.
  3. Extract the category name and VALUES data.
  4. Finally analysis. This time I plot the age pyramid.
#!/usr/bin/env python
#-*- coding: utf-8 -*-

import httplib2
import lxml.etree
import pylab
import matplotlib.font_manager as fm

#Initial setting
h = httplib2.Http('.cache')
key = "xxx"
baseUrl = "http://api.e-stat.go.jp/rest/1.0/app"
statsCode = "00200521"

#First data for government statistics code 00200521
# (1980 Census)Fetch the data ID of
print "getStatusList..."
cmd = "%s/getStatsList?appId=%s&statsCode=%s"
response, content = h.request(cmd % (baseUrl, key, statsCode))
xml = lxml.etree.fromstring(content)
dataid = xml.xpath('//LIST_INF')[0].attrib["id"]


#Extract the actual data using the data ID as a key
print "getStatusData..."
cmd = "%s/getStatsData?appId=%s&statsDataId=%s"
response, content = h.request(cmd % (baseUrl, key, dataid))
xml = lxml.etree.fromstring(content)


#Extraction of category name
categories = {}
for c in xml.xpath("//CLASS_OBJ"):
    categories[c.attrib["id"]] = {"name": c.attrib["name"],
                                  "labels": {}}
    print c.attrib["id"]
    for label in c.xpath("CLASS"):
        print label.attrib["name"], label.attrib["code"]
        categories[c.attrib["id"]]["labels"][label.attrib["code"]] = label.attrib["name"]


#Extracting the value
values = [{"cat01": v.attrib["cat01"],
           "cat02": v.attrib["cat02"],
           "cat03": v.attrib["cat03"],
           "area": v.attrib["area"],
           "value": int(v.text)}
          for v in xml.xpath('//VALUE')]


#Age group(cat03)Aggregation by
c = categories["cat03"]
data   = []
labels = []
for code in sorted(c["labels"].keys())[1:]:
    labels.append(c["labels"][code])
    data.append(sum([v["value"] for v in values if v["cat03"] == code]))
print data


#plot
width = 0.5
x = pylab.arange(len(data))
prop = fm.FontProperties(fname='/Library/Fonts/Osaka.ttf') # for mac
pylab.barh(x, data, width)
pylab.yticks(x + width / 2, labels)
pylab.show()

Click here for results

figure_1.png

Reference information

Official manual http://www.e-stat.go.jp/api/wp/wp-content/uploads/2014/10/API-spec.pdf

Interface to touch the API of the web http://www.e-stat.go.jp/api/sample/testform/

Recommended Posts

Try to get statistics using e-Stat
[Statistics] [R] Try using quantile regression.
Try using pynag to configure Nagios
Try to detect fusion movement using AnyMotion
Try to operate Excel using Python (Xlwings)
Try to download Youtube videos using Pytube
Try using Tkinter
Try using docker-py
Try using cookiecutter
Try using PDFMiner
Try using geopandas
Try using Selenium
Try using scipy
Try to get the road surface condition using big data of road surface management
Try using pandas.DataFrame
Try using django-swiftbrowser
Try using matplotlib
Try using tf.metrics
Try using PyODE
Try using django-import-export to add csv data to django
Try to separate Controllers using Blueprint in Flask
How to get article data using Qiita API
Try to create an HTTP server using Node.js
Try to get a web page and JSON file using Python's Requests library
[Rails] How to get location information using Geolocation API
(Python) Try to develop a web application using Django
Every time I try to read a csv file using pandas, I get a numpy error.
Try using virtualenv (virtualenvwrapper)
Try to make RESTful API with MVC using Flask 1.0.2
[Azure] Try using Azure Functions
Try to get the contents of Word with Golang
Try to delete tweets in bulk using Twitter API
Try using W & B
Try using Django templates.html
[Kaggle] Try using LGBM
Try using Python's feedparser.
Try using Python's Tkinter
Try to extract high frequency words using NLTK (python)
[Machine learning] Try to detect objects using Selective Search
Try using Tweepy [Python2.7]
Try to solve Sudoku at explosive speed using numpy
Try using Pytorch's collate_fn
I get an error when I try to raise Python to 3 series using pyenv on Catalina
[Solution] When I try to connect to CloudSQL with GAE, I get an ImportError only when using dev_appserver.
Try to make it using GUI and PyQt in Python
Try to get the function list of Python> os package
Try to make PC setting change software using TKinter (beginner)
Try to operate an Excel file using Python (Pandas / XlsxWriter) ①
Try to operate an Excel file using Python (Pandas / XlsxWriter) ②
I tried to get Web information using "Requests" and "lxml"
Try to determine food photos using Google Cloud Vision API
Get Python scripts to run quickly in Cloud Run using responder
Try to get data while port forwarding to RDS with anaconda.
Try to model a multimodal distribution using the EM algorithm
How to get temperature from switchBot thermo-hygrometer using raspberry Pi
Try to implement linear regression using Pytorch with Google Colaboratory
I tried to get data from AS / 400 quickly using pypyodbc
Try to factorial with recursion
Try using PythonTex with Texpad.
Try using Jupyter's Docker image
Try using scikit-learn (1) --K-means clustering