[PYTHON] Convert PDF of list of Go To EAT member stores in Niigata prefecture to CSV

Convert the list of available stores (PDF) in Niigata Go To Eat Campaign to CSV

#Download PDF
wget https://niigata-gte.com/pdf/shop-list.pdf -O data.pdf

apt install python3-tk ghostscript
pip install camelot-py[cv]

command

camelot -p all -o data.csv -f csv -strip '\n' -split lattice -scale 40 data.pdf

Python

import camelot
import pandas as pd

tables = camelot.read_pdf(
    "data.pdf", pages="all", split_text=True, strip_text="\n", line_scale=40
)

df = pd.concat(
    [
        table.df.iloc[1:].set_axis(
            ["Dealer code", "Genre", "Store name", "Street address", "phone number", "Take-out", "delivery"], axis=1
        )
        for table in tables
    ]
)

df.to_csv("niigata.csv", encoding="utf_8_sig")

Recommended Posts

Convert PDF of list of Go To EAT member stores in Niigata prefecture to CSV
Convert PDF of Go To EAT member stores in Ishikawa prefecture to CSV
Convert PDF of available stores of Go To EAT in Kagoshima prefecture to CSV
Convert PDF of Kumamoto Prefecture Go To EAT member store list to CSV
Convert PDF of Chiba Prefecture Go To EAT member store list to CSV (command)
Scraping the list of Go To EAT member stores in Niigata prefecture and converting it to CSV
Scraping the list of Go To EAT member stores in Fukuoka prefecture and converting it to CSV
Convert PDF of Go To Eat Hokkaido campaign dealer list to CSV
Scraping the member stores of Go To EAT in Osaka Prefecture and converting them to CSV
Convert PDF of new corona outbreak case in Aichi prefecture to CSV
Convert PDF of product list containing effective surfactants for new coronavirus to CSV
Convert markdown to PDF in Python
Convert PDF of the progress of the division of labor (trends in insurance dispensing) of the Japan Pharmaceutical Association to CSV
The story of creating a store search BOT (AI LINE BOT) for Go To EAT in Chiba Prefecture (1)
How to convert csv to tsv in CLI
Convert from PDF to CSV with pdfplumber
I want to convert a table converted to PDF in Python back to CSV
How to achieve something like a list of void * (or variant) in Go?
Convert UTF-8 CSV files to read in Excel
COCO'S Breakfast Buffet List PDF Converted to CSV
Batch convert PSD files in directory to PDF
Convert PDF of the situation of people infected in Tokyo with the new coronavirus infection of the Tokyo Metropolitan Health and Welfare Bureau to CSV
Convert a slice object to a list of index numbers
[Python] Convert PDF text to CSV page by page (2/24 postscript)
Convert the image in .zip to PDF with Python
Stumble when converting bidirectional list to JSON in Go
Batch convert all xlsx files in the folder to CSV files
How to get a list of built-in exceptions in python
Django Changed to save lots of data in one go
Command to list all files in order of file name
Convert PDF of Sagamihara City presentation materials (occurrence status, etc.) regarding new coronavirus infection to CSV