Try to get statistics using e-Stat

Introduction

Access to government statistics from today (October 31, 2014) It seems that the Web API has been released. http://www.e-stat.go.jp/api/

A list of available data can be found at the URL below. There is a lot from the census to the labor statistics. http://www.e-stat.go.jp/api/api-info/api-data/

It looks interesting! The record I used for that.

Registration

First, access this page and register as a user. http://www.e-stat.go.jp/api/regist-login/ Enter your email address and name.
A notification has arrived at your email address, so click it to activate it.
Next, log in. Get the application ID. It seems to be up to 3 IDs per person. From now on, the application ID will be xxx.

Data acquisition and plotting

The procedure is as follows.

Access getStatsList and retrieve the appropriate ID.
Access getStatsData to retrieve the data.
Extract the category name and VALUES data.
Finally analysis. This time I plot the age pyramid.

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import httplib2
import lxml.etree
import pylab
import matplotlib.font_manager as fm

#Initial setting
h = httplib2.Http('.cache')
key = "xxx"
baseUrl = "http://api.e-stat.go.jp/rest/1.0/app"
statsCode = "00200521"

#First data for government statistics code 00200521
# (1980 Census)Fetch the data ID of
print "getStatusList..."
cmd = "%s/getStatsList?appId=%s&statsCode=%s"
response, content = h.request(cmd % (baseUrl, key, statsCode))
xml = lxml.etree.fromstring(content)
dataid = xml.xpath('//LIST_INF')[0].attrib["id"]


#Extract the actual data using the data ID as a key
print "getStatusData..."
cmd = "%s/getStatsData?appId=%s&statsDataId=%s"
response, content = h.request(cmd % (baseUrl, key, dataid))
xml = lxml.etree.fromstring(content)


#Extraction of category name
categories = {}
for c in xml.xpath("//CLASS_OBJ"):
    categories[c.attrib["id"]] = {"name": c.attrib["name"],
                                  "labels": {}}
    print c.attrib["id"]
    for label in c.xpath("CLASS"):
        print label.attrib["name"], label.attrib["code"]
        categories[c.attrib["id"]]["labels"][label.attrib["code"]] = label.attrib["name"]


#Extracting the value
values = [{"cat01": v.attrib["cat01"],
           "cat02": v.attrib["cat02"],
           "cat03": v.attrib["cat03"],
           "area": v.attrib["area"],
           "value": int(v.text)}
          for v in xml.xpath('//VALUE')]


#Age group(cat03)Aggregation by
c = categories["cat03"]
data   = []
labels = []
for code in sorted(c["labels"].keys())[1:]:
    labels.append(c["labels"][code])
    data.append(sum([v["value"] for v in values if v["cat03"] == code]))
print data


#plot
width = 0.5
x = pylab.arange(len(data))
prop = fm.FontProperties(fname='/Library/Fonts/Osaka.ttf') # for mac
pylab.barh(x, data, width)
pylab.yticks(x + width / 2, labels)
pylab.show()

Click here for results

Reference information

Official manual http://www.e-stat.go.jp/api/wp/wp-content/uploads/2014/10/API-spec.pdf

Interface to touch the API of the web http://www.e-stat.go.jp/api/sample/testform/