Introduction

COCO'S breakfast buffet list store list Get PDF file and convert to CSV For reference, I converted to PDF with camelot, data cleansed with Pandas, and converted to CSV.

Preparation

apt install python3-tk ghostscript
pip install camelot-py[cv]
pip install pandas

Data cleansing

import camelot
import pandas as pd

tables = camelot.read_pdf(
    "https://www.cocos-jpn.co.jp/menu_pdf/bvshoplist.pdf",
    pages="all",
    split_text=True,
    strip_text="\n",
    line_scale=40,
)

#Column name
columns = ["".join(i) for i in zip(*(tables[0].df.head(2).values))]

dfs = [table.df.iloc[3:].set_axis(columns, axis=1) for table in tables]

#Renumber
df = pd.concat(dfs).reset_index(drop=True)
df.index += 1

#Replace empty string with missing
df.mask(df == "", inplace=True)

#If the implementation date is daily, weekdays, Saturdays and Sundays
df["Implementation date"] = df["Implementation date"].where(df["Usage fee"].isnull(), df["Usage fee"])

#Complement daily store information
df.fillna(method="ffill", inplace=True)

#Delete the usage charge column
df.drop("Usage fee", axis=1, inplace=True)

#Tax-included amount
adult = (
    df["grown up"]
    .str.extractall("([0-9]+)")
    .unstack()
    .rename(columns={0: "grown up_Tax excluded", 1: "grown up_tax included"}, level=1)
)
adult.columns = adult.columns.droplevel(level=0)
df["grown up"] = adult["grown up_tax included"].astype(int)

#Tax-included amount
child = (
    df["Elementary school students and younger"]
    .str.extractall("([0-9]+)")
    .unstack()
    .rename(columns={0: "child_Tax excluded", 1: "child_tax included"}, level=1)
)
child.columns = child.columns.droplevel(level=0)
df["Elementary school students and younger"] = child["child_tax included"].astype(int)

#Address column name change
df.rename(columns={"After the address": "Street address"}, inplace=True)

#Unicode normalization of addresses, whitespace removal
df["Street address"] = df["Street address"].str.normalize("NFKC").str.replace(" ", "")

df.to_csv("cocos.csv", encoding="utf_8_sig")

reference

COCO'S Breakfast Buffet List PDF files are acquired and converted to CSV

[PYTHON] COCO'S Breakfast Buffet List PDF Converted to CSV

Introduction

Preparation

Data cleansing

reference