[PYTHON] Data cleansing of open data of the occurrence situation of the Ministry of Health, Labor and Welfare

Ministry of Health, Labor and Welfare, domestic outbreak situation, etc. Open data

Data cleansing because csv contains \ n newline characters and the previous day's ratio is in the same cell

import re
import pandas as pd

df = pd.read_csv("https://www.mhlw.go.jp/content/current_situation.csv", index_col=0)

df.index = df.index.str.replace(r"※\d", "").str.replace(",", "").str.replace(r"\\n", "")
df.columns = df.columns.str.replace(r"※\d", "").str.replace(r"\\n", "").str.strip()

df = df.applymap(lambda s: re.sub(r"※\d", "", s))

dfs = []

for name, col in df.iteritems():

    df_tmp = col.str.split(r"\\n", expand=True).rename(columns={0: "Cumulative", 1: "The day before ratio"})
    df_tmp.columns = pd.MultiIndex.from_product([[name], df_tmp.columns])

    dfs.append(df_tmp)

df = pd.concat(dfs, axis=1).fillna(0)

df = df.applymap(lambda s: str(s).replace(",", "").strip().strip("()")).astype(int)

df.to_csv("current_situation.csv", encoding="utf_8_sig")

Recommended Posts

Data cleansing of open data of the occurrence situation of the Ministry of Health, Labor and Welfare

Data wrangling (pdfplumber) PDF about influenza outbreak situation of Ministry of Health, Labor and Welfare

Data Langling PDF on the outbreak of influenza by the Ministry of Health, Labor and Welfare

Scraping PDF of the status of test positives in each prefecture of the Ministry of Health, Labor and Welfare

Scraping PDF of the national list of minimum wages by region of the Ministry of Health, Labor and Welfare

The story of verifying the open data of COVID-19

[Python] Create a script that uses FeedParser and LINE Notify to notify LINE of the latest information on the new coronavirus of the Ministry of Health, Labor and Welfare.

Data cleansing 3 Use of OpenCV and preprocessing of image data

[Python] Automatically read prefectural information on the new coronavirus from the PDF of the Ministry of Health, Labor and Welfare and write it in a spreadsheet or Excel.

Let's use the open data of "Mamebus" in Python

Convert PDF of the situation of people infected in Tokyo with the new coronavirus infection of the Tokyo Metropolitan Health and Welfare Bureau to CSV

Get data using Ministry of Internal Affairs and Communications API

Python application: Data cleansing # 3: Use of OpenCV and preprocessing of image data

Occurrence and resolution of tensorflow.python.framework.errors_impl.FailedPreconditionError

Beginning of Nico Nico Pedia analysis ~ JSON and touch the provided data ~

About Boxplot and Violinplot that visualize the variability of independent data