[PYTHON] Data cleansing of open data of the occurrence situation of the Ministry of Health, Labor and Welfare

Ministry of Health, Labor and Welfare, domestic outbreak situation, etc. Open data

Data cleansing because csv contains \ n newline characters and the previous day's ratio is in the same cell

import re
import pandas as pd

df = pd.read_csv("https://www.mhlw.go.jp/content/current_situation.csv", index_col=0)

df.index = df.index.str.replace(r"※\d", "").str.replace(",", "").str.replace(r"\\n", "")
df.columns = df.columns.str.replace(r"※\d", "").str.replace(r"\\n", "").str.strip()

df = df.applymap(lambda s: re.sub(r"※\d", "", s))

dfs = []

for name, col in df.iteritems():

    df_tmp = col.str.split(r"\\n", expand=True).rename(columns={0: "Cumulative", 1: "The day before ratio"})
    df_tmp.columns = pd.MultiIndex.from_product([[name], df_tmp.columns])

    dfs.append(df_tmp)

df = pd.concat(dfs, axis=1).fillna(0)

df = df.applymap(lambda s: str(s).replace(",", "").strip().strip("()")).astype(int)

df.to_csv("current_situation.csv", encoding="utf_8_sig")

Recommended Posts

Data cleansing of open data of the occurrence situation of the Ministry of Health, Labor and Welfare
Data wrangling (pdfplumber) PDF about influenza outbreak situation of Ministry of Health, Labor and Welfare
Data Langling PDF on the outbreak of influenza by the Ministry of Health, Labor and Welfare
Scraping PDF of the status of test positives in each prefecture of the Ministry of Health, Labor and Welfare
Scraping PDF of the national list of minimum wages by region of the Ministry of Health, Labor and Welfare
The story of verifying the open data of COVID-19
[Python] Create a script that uses FeedParser and LINE Notify to notify LINE of the latest information on the new coronavirus of the Ministry of Health, Labor and Welfare.
Data cleansing 3 Use of OpenCV and preprocessing of image data
[Python] Automatically read prefectural information on the new coronavirus from the PDF of the Ministry of Health, Labor and Welfare and write it in a spreadsheet or Excel.
Let's use the open data of "Mamebus" in Python
Convert PDF of the situation of people infected in Tokyo with the new coronavirus infection of the Tokyo Metropolitan Health and Welfare Bureau to CSV
Get data using Ministry of Internal Affairs and Communications API
Python application: Data cleansing # 3: Use of OpenCV and preprocessing of image data
Occurrence and resolution of tensorflow.python.framework.errors_impl.FailedPreconditionError
Beginning of Nico Nico Pedia analysis ~ JSON and touch the provided data ~
About Boxplot and Violinplot that visualize the variability of independent data