[PYTHON] We have released a tool that merges the boundary data of cities, wards, towns and villages with the target statistical data using the API of the official statistics counter "e-Stat"!

Introduction

Everyone. Do you know the hottest web application in the world right now?

so.

e-Stat.

image.png

Government statistics are compiled by field and data creation organization, and data can be searched. Furthermore, it is a portal site that can be viewed in various formats such as graphs and tables.

The API is also prepared, and the site design is cute and modern.

In addition, it seems that there is also a function called statistical GIS that visualizes statistical data on a map.

image.png

This seems to be a function that allows you to select specific statistical data such as census, vital statistics, medical facility survey, etc. and visualize it on a map.

Wonderful!

Looking at such an application, it is natural for a GIS shop to be able to stand

"I also want to make this !!!!"

I think.

But this "e-Stat". There were many points where an amateur would easily fall into a deep swamp ...

table of contents

-Introduction -Swamp Point

Swamp point

Point (1): Too much data problem

A wide variety of data is registered in e-Stat.

First, there are 17 types of statistical fields such as "land / weather" and "population / household". Search by field

As an organization, there are about 14 government statistics under the jurisdiction of "Cabinet Secretariat" and "Ministry of Economy, Trade and Industry". Search by organization

You need to search for the data you need from a total of about 1.5 million (!) Data.

image.png

Furthermore, since the data format is different for each field and ministry, even if the desired statistical data can be obtained, it will be fun to see if it can be used effectively.

Point (2): API is quite complicated Problem

Since I specialize in creating web applications in the GIS field, I was very welcome to have an API that can be used on the web, but [API specification 3.0 of the official statistics counter (e-Stat) If you look at Version, it seems that the following 7 APIs are also prepared.

--Acquisition of statistical table information --Meta information acquisition --Statistical data acquisition --Dataset registration --Dataset reference --Data catalog information acquisition --Batch statistical data acquisition

It may be a story that you can just read the specifications well, but even if you try to get statistical data easily for the time being, it will be a bit painful to start from the stage of investigating which API you are aiming for. is. (There is too much data, so it can't be helped)

Point ③: 〇〇 Code too many problems

Various data are managed by some codes other than the "statistical field" introduced earlier.

I couldn't confirm the page that was compiled in particular, but when I looked at the API specifications, I found at least four XX codes and IDs, including those in the statistical field.

--Small classification code in the field of statistics --Government statistics code --Statistical table ID --Standard area code

In conclusion, you can get the desired statistical data by using the above code in combination with 3 types of API introduced earlier, but even if you understand the meaning of these phrases, maybe Doesn't it take hours to days? (I took)

For the time being, the Development Guide is prepared, and the flow for acquiring the target statistical data is carefully described, but I do not know what kind of statistical data can be obtained, and The attached image is too small to read the characters, and you end up searching various pages to collect information.

Spicy.

Point ④: Surprisingly, there is no data that can be used in GIS.

Users of e-Stat probably want to do various analyzes based on statistical data! I think that is the person.

Among them, I think that there are many people who want to perform analysis using GIS, but in reality, there was not so much data available in GIS.

If you look closely at the page for searching data, the number of data managed as a "database" is 263/911, while the number of other mysterious files managed is 648/911.

image.png

Moreover, there is a high probability that it will be a PDF.

It will not be data that can be easily used from your own application.

By the way, Q1: I want to know the data that can be used with the API function. If you look at the link of "Provided data", you will be taken to the following page, so it seems that about 180,000 out of 1.5 million can be used as API.

image.png

Others

URL does not change depending on json output or csv output

--JSON output: …/getStatsData? <Parameter group> --CSV output: …/getSimpleStatsData? <Parameter group>

(why…)

Do not output strangely even though it is spit out with csv

"RESULT"
"STATUS","0"
"ERROR_MSG","It ended normally."
"DATE","2020-12-18T15:58:49.161+09:00"
"TABLE_INF","0000010101"
"STAT_NAME","00200502","Social / demographic system"
"GOV_ORG","00200","Ministry of Internal Affairs and Communications"
"STATISTICS_NAME","Prefectural data Basic data"
"TITLE","0000010101","A Population / household"
"CYCLE","Yearly"
"SURVEY_DATE","0"
"OPEN_DATE","2020-03-06"
"SMALL_AREA","0"
"COLLECT_AREA","Nationwide"
"MAIN_CATEGORY","99","Other"
"SUB_CATEGORY","99","Other"
"OVERALL_TOTAL_NUMBER","486096"
"UPDATED_DATE","2020-03-06"
"STATISTICS_NAME_SPEC","Prefecture data","basic data","","","",""
"TITLE_SPEC","","A Population / household","","",""
"CLASS_INF"
"CLASS_OBJ_ID","CLASS_OBJ_NAME","CLASS_CODE","CLASS_NAME","CLASS_LEVEL","CLASS_UNIT","CLASS_PARENT_CODE","CLASS_ADD_INF"

(When the desired csv comes out ...)

Also, there are many old version APIs left and it is difficult to follow the information, etc ...

e-stat-api-tools

With that said, I tried licking various data and APIs, but e-Stat is very convenient but has some addictive points.

I think there are many people who think, "The threshold is a little high ..." like me.

MIERUNE can be used for those who are lost. After preparing a list of available data and creating a wrapper for the main API, the city boundary polygon and the target statistical data are merged and output as GeoJSON. We have created a CLI tool that can be used!

e_stat_api_tools

image.png

Data maintenance

Government statistics code

In conclusion, to get the desired statistical data

--Enter statistical table ID andmeta information (detailed item information)in statistical data acquisition API

It is necessary, but in order to reach this point, statistical table ID and meta information must be obtained from different APIs.

For e-Stat API, as described in Development Guide

--Get the statistical table ID of the target statistical table by entering the government statistical code or survey date in the statistical table information acquisition API. --Enter the acquired statistical table ID in the metadata information acquisition API to acquire the meta information (detailed item information) related to the target statistical table. --Enter the statistical table ID andmeta information (detailed item information)in the statistical data acquisition API to acquire the desired statistical data.

Hit the three APIs in the flow of to get the desired statistical data.

(You can try various APIs with the official API Function Test Form Version 3.0, so if you look at what kind of input and what kind of response will be returned, you will understand better.)

However, the first step, Government Statistics Code, can be referenced from Government Statistics Code List, but it is in PDF format.

So, first of all, I converted this data to tsv so that it can be machine-readable.

It is stored in the repository, so feel free to use it. government_statistics_codes.tsv

image.png

Statistical table ID

The above-mentioned official statistics code is used to obtain the statistical table ID, but if you look at the e-Stat data or ask about the trends in the industry, the main statistics that you will want to use in GIS. Isn't the data Social and Population Statistics? I thought (without permission).

Therefore, I have stored the statistical table ID list in the repository so that the social and demographic system can be used smoothly. default_stats_table_ids.csv

image.png

Standard area code

The standard area code is a code indicating "prefecture and municipal area". Standard area code used for statistics

It is a part related to boundary data acquisition, which will be described later, rather than statistical data acquisition, but this is also included in the repository.

The data is as of December 2020, so if you have any integration of cities, wards, towns and villages, please download and update from Find Municipality.

standard_area_codes.csv

image.png

Python wrapper

The following main APIs are wrapped in e-stat package, so they can be called from Python, but at this stage the maintenance is not perfect and it is not registered in PyPI, so I will explain it. Is omitted. (Any one will be supported)

--Statistical table information acquisition API --Metadata information acquisition API --Statistical data acquisition API

CLI tool

Using the above package, I created a tool that can operate the following 5 from the command line.

--ids: Get statistical table ID list --meta: Get statistical table metadata --stats: Get statistical data --Boundary: Get boundary data --merge-boundary: Get and merge statistical data and boundary data

Please refer to README.md for details on how to use various commands.

Try merging the boundary data of the city and the target statistical data

So let's use this tool to merge boundary data and stats!

First, clone e_stat_api_tools and move to the e_stat_api_sample directory.

% cd .../e_stat_api_sample
% pwd
.../e_stat_api_sample

Then use the pipenv install command to install the required packages.

If you don't have pipenv installed

% brew install pipenv

Or

% pip install -U pip
% pip install pipenv

Please install with.

This tool requires an e-stat application ID, so complete user registration according to the User's Guide, register the application ID, and execute the following command to create a .env file.

% touch /e_stat_api_sample/e_stat/.env
% echo "app_id=<YOUR_APP_ID>" >> /e_stat_api_sample/e_stat/.env

After creating the .env file that stores the application ID, enter the following items with the merge-boundary command and execute it.

  -p, --pref_name TEXT Enter the prefecture name of the shp file to get[required]
  -d, --download_dir TEXT Enter the path string of the directory that stores the shp file to download[required]
  -a, --area TEXT Enter the standard area code of the statistical data to be acquired.[required]
  -c, --class_code TEXT Enter the item of statistical data to be acquired[required]
  -y, --year TEXT Enter the year of the statistical data to be acquired[required]
  -st, --stats_table_id TEXT Enter the statistical table ID of the statistical data you want to acquire.[required]
  -o, --output_dir TEXT Enter the path string of the directory that stores the downloaded csv[required]

Specifically, the command is as follows.

% pipenv run python -m e_stat merge-boundary \
  -p Hokkaido\
  -d ./download_file \
  -a 01101 \
  -c A1101 \
  -y 2000 \
  -st 0000020101 \
  -o ./created

Not limited to merge-boundary, shell scripts are prepared as samples for the 5 types of commands, so the following commands can be used to check the operation.

% bash merge_boundary.sh

If you try to execute bash merge_boundary.sh, the following log will be displayed, and then merge_boundary.geojson should be generated in the created directory.

% bash merge_boundary.sh
0.00B [00:00, ?B/s]url='https://www.e-stat.go.jp/gis/statmap-search/data?dlserveyId=A002005212015&code=01&coordSys=1&format=shape&downloadType=5', res.status_code=200
A002005212015DDSWC01.Start downloading the zip
15.8MB [00:05, 2.75MB/s]
Import file:.../e_stat_api_tools/download_file/A002005212015DDSWC01.zip
.Convert shp file in zip file to gdf

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super(GeoDataFrame, self).__setitem__(key, value)
Specify gdf key(AREA_CODE)Combine with.
Convert gdf to geojson.
.../e_stat_api_tools/created/boundary.I exported geojson.
.../e_stat_api_tools/created/boundary.I exported csv.
Get a statistical table. URL=http://api.e-stat.go.jp/rest/3.0/app/getSimpleStatsData?appId=9a10491cd87e8877b5410283228bb64b7805ff79&cdArea=01101&cdCat01=A1101&cdTime=2000100000&statsDataId=0000020101&lang=J&metaGetFlg=N&c&explanationGetFlg=N&annotationGetFlg=N&sectionHeaderFlg=2
Convert gdf to geojson.
.../e_stat_api_tools/created/merge_boundary.I exported geojson.
.../e_stat_api_tools/created/merge_boundary.I exported csv.

Let's open the created merge_boundary.geojson in QGIS!

image.png

01101 In other words, you can see that the following statistical data has been merged into the boundary data of Chuo-ku, Sapporo.

--A1101 (A population / total household population) --2000 (2000) --0000020101 (Social / Demographic System Municipal Data)

in conclusion

We have created and introduced a tool that allows you to easily merge e-Stat boundary data and statistical data. How was it?

Since it is an open source (MIT license) tool, we will continue to add and improve functions as needed, but if you have any problems or requests, we would appreciate it if you could contact us.

We are also good at visualizing and analyzing various data, creating tools and creating WebGIS, and we have received orders and developed functions that meet the diverse needs of many customers. If you are interested, please feel free to contact us from Our website.

We are also developing the map distribution service MapTiler for Japan. You can use high quality map data including vector tiles with overwhelmingly high cost performance compared to other companies. For more information, please visit MapTiler.jp!

Recommended Posts

We have released a tool that merges the boundary data of cities, wards, towns and villages with the target statistical data using the API of the official statistics counter "e-Stat"!
A story about predicting prefectures from the names of cities, wards, towns and villages with Jubatus
Let's use the API of the official statistics counter (e-Stat)