[PYTHON] A simple version of government statistics (immigration control) that is easy to handle with jupyter

I took the immigration statistics of the Ministry of Justice and created Simple version of notebok to process.

I have a lot of comments, but I think it is convenient to modify because it is not classified.

Paste the jupyter code below.

#Make the minimum library settings.
import pandas as pd #Loading a data processing library called pandas
import re  #For regular expressions
from IPython.display import display, HTML #Load library for display

#Get the excel data and put it in pandas. This alone is okay
#The original Excel is the premise of one sheet.
moto= pd.read_excel('http://www.e-stat.go.jp/SG1/estat/GL08020103.do?_xlsDownload_&fileId=000007747517&releaseCount=1')
#The date is 2016 8, so as additional information, year,Set the month. (It's good to scroll and get it automatically.)
year = 2016
month = 8
#First, check the total number
count_all = int(moto.iloc[2,3])
#Check the total number( count_Since all is an integer, display it as a string and display it as HTML. )
display(HTML('<h1>Total number= ' + str(count_all) + '<h1>'))

#moto.head()Then you can quickly see the first 5 lines of the data.
#The result of the last run row in the cell is displayed. print(moto.head())You don't have to do it.
#You can see that there are extra rows and columns when summarizing.
moto.head()

#columns Information The port name is taken from the 5th (Kushiro) in the first line.
columns = moto.iloc[1,4:]
#There are various difficulties such as line breaks being inserted just by fetching.
display(columns[12:18])
#index is Unnamed:Since it is 16, it is difficult to understand, so change it to a serial number. It's just a matter of appearance, so you don't have to do it.
columns.index = range(0,len(columns))
display(columns[12:18])
#Line breaks, blanks,Etc. are removed.
columns = columns.apply(lambda x: x.replace("\n",'').replace(' ',''))
display(columns[12:18])

# index (Country)Create a. (You don't have to create it, but you can add it because it is convenient for deleting line data.)
idx = moto.iloc[3:,1]
#Line breaks, blanks,Etc. are removed.
idx = idx.apply(lambda x: x.replace("\n",'').replace(' ',''))
display(idx[12:18])
#Country(country), Port),Number of immigrants(num)Creates the formatted data of.
#First, create empty data.
xdf = pd.DataFrame(columns=['country','port','num'])
#Vertically arranged country,port,num 
for i in range(0,len(columns)):
    x = pd.DataFrame(idx)
    x.columns = ['country']
    x['port'] = columns[i]
    x['num'] = moto.iloc[4:,4 + i]
    x.index = idx
    x.index.name = 'country'
    xdf = xdf.append(x)
display(xdf.head(210).tail(10))

#Remove regional subtotals The subtotals are NaN (no data).
area = ['Asia', 'Europe','Africa', 'North America', 'South America', 'Oceania']
#I am using index to delete it. It is easier than specifying in the search. inplace=I am rewriting xdf directly by specifying True.
xdf.drop(area,inplace=True)
display(xdf.head(210).tail(10))

#Add the year and month items.
xdf['year'] = year
xdf['month'] = month
display(xdf.head(210).tail(10))

#Calculate total number for confirmation
display(HTML('<h1>' + str(xdf['num'].sum()) + '</h1>'))

#Save the formatted file in an excel file.
xdf.to_excel('immigration.xlsx')

Recommended Posts

A simple version of government statistics (immigration control) that is easy to handle with jupyter
Let's create an external specification that is easy to handle
[For beginners] A word summary of popular programming languages (2018 version)
Python Tools for Visual Studio Installation Guide
Create a dataset of images to use for learning
A simple version of government statistics (immigration control) that is easy to handle with jupyter
Installation of OpenMDAO (version 1.7.1)
Installation procedure for Python and Ansible with a specific version
[End of 2020] A memo to start using AWS CLI (Version 2)
Use a scripting language for a comfortable C ++ life-OpenCV-Port Python to C ++-
A story that struggled to handle the Python package of PocketSphinx
I tried to make a mechanism of exclusive control with Go
Try to make a kernel of Jupyter
Prepare a development environment that is portable and easy to duplicate without polluting the environment with Python embeddable (Windows)
I tried to make a simple mail sending application with tkinter of Python
To output a value even in the middle of a cell with Jupyter Notebook
Free version of DataRobot! ?? Introduction to "PyCaret", a library that automates machine learning
Separately install a version of Python that is not pre-installed on your Mac
A new form of app that works with GitHub: How to make GitHub Apps
A method to replace a server that is executing batch processing with cron with zero downtime
[Mac] I want to make a simple HTTP server that runs CGI with Python
The theory that the key to controlling infection with the new coronavirus is hyperdispersion of susceptibility
A simple system that automatically shoots with object detection and sends it to LINE
It was a little difficult to do flask with the docker version of nginx-unit
[AWS] Development environment version that tried to build a Python environment with eb [Elastic Beanstalk]
Simple statistics that can be used to analyze the effect of measures on EC sites and codes that can be used in jupyter notebook
Two solutions to the problem that it is hard to see the vector field when writing a vector field with quiver () of matplotlib pyplot